[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-20 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074929#comment-18074929
 ] 

Julian Hyde commented on CALCITE-7463:
--

My view is that the planner could run in an ARBITRARY_REPEATABLE mode where two 
expressions that are arbitrary and structurally equal are considered to be 
equivalent (i.e. always return the same results). In that mode, rewrites such 
as the above would be valid.

If a rule needs ARBITRARY_REPEATABLE, it would have to declare it. The rule 
would not fire in a planner where ARBITRARY_REPEATABLE is false.

We would need to classify expressions into deterministic, arbitrary, random. 
(We might need to refine the distinction because an expression like 'select * 
from emp order by deptno' has deterministic rows but arbitrary ordering; 
'select * from emp order by deptno limit 3' has arbitrary rows and ordering.)

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-20 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074837#comment-18074837
 ] 

Zhen Chen commented on CALCITE-7463:


My view is that we shouldn't pile one uncertainty on top of another. If we 
cannot reach a consensus, I think we should simply set this issue aside for the 
time being—because, in my opinion, this kind of optimization is pointless.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Steven Phillips (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074386#comment-18074386
 ] 

Steven Phillips commented on CALCITE-7463:
--

That's actually not quite what I'm saying. The fact that it probably does it 
anyway is not the foundation of my argument as to why it's correct. My argument 
is basically:

Limit without order by means: return two rows, it doesn't matter which ones or 
what method is used to choose them.
Therefore, choosing the two rows for the subqueries such that they return the 
exact same two rows in each instance is a valid optimizer choice, as it meets 
the requirements of the request.

With the random() function, there is a different meaning. It doesn't mean give 
me any value, I don't care what or how you decided. It explicitly means, give 
me a value with random probability over a uniform distribution. So, while it 
could be any value, I do care how it's done, or at least I expect the result to 
be generated in a way that is (pseudo)random.

Another analogy would be the ANY aggregate function, which some engines 
support. It's arbitrary, but not random, so it would be valid to transform the 
query:

{code:sql}
select any(x) from t
{code}

to 

{code:sql}
select min(x) from t
{code}

whereas:

{code}
select x from t order by random() limit 1
{code}

would not be valid to transform this way, since the random() function has the 
expectation of (pseudo)randomness. 

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074382#comment-18074382
 ] 

Julian Hyde commented on CALCITE-7463:
--

I see. You're saying that we are allowed to "cook the books" and assume that 
the two scans return rows in the same order - because it's PROBABLY what would 
happen anyway.

The argument about nondeterministic vs random gets rather philosophical, so 
could we couch it in terms of probability? Namely, it's not valid to optimize 
'select random() except select random()' to the empty relation because, while 
it's possible that both calls to random() return the same number, it's unlikely.

Or I guess we could say that the order of scans is *arbitrary* and therefore we 
*choose* that they will return the same.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Steven Phillips (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074378#comment-18074378
 ] 

Steven Phillips commented on CALCITE-7463:
--

The point of optimizing queries is for them to perform faster with less 
resources. It's the goal of a query optimizer. And the unoptimized query in 
this case could potentially be very expensive, e.g. if the limit is much 
higher, and there are more than just 2 instances. And I'm generally not a fan 
of removing optimizations that give correct results.

My point is, in this particular case, it is entirely, 100% up to the optimizer 
to decide what to return.  When I request 2 arbitrary rows from a relation, 
without specifying an order by, it is valid for the optimizer to choose any 
plan whatsoever that satisfies that request. If it decides that a plan that 
returns the exact same result set every time is the best plan, that is 
perfectly fine. If the planner decides some other plan, which potentially 
returns different results each time is best, that's also valid. The planner can 
also choose a plan that returns the same result every time in some cases, and a 
plan that returns different results every time in other cases, and that's still 
fine. The only requirement is that planner gives a plan that returns results 
that are consistent with what the query requests.

So I disagree with [~julianhyde] slightly, in that, I think this optimization 
is valid if we don't consider CTEs/subqueries to be repeatable, which is 
different from how we should handle the RANDOM() function, as RANDOM() comes 
with an expectation of randomness, whereas LIMIT means give me any result that 
matches, I don't care how.

In other words, I think


{code:sql}
select random()
union
select random()
{code}

should not be reduced, but if we treat it as a REPEATABLE CTE like Julian is 
suggesting, then it could be reduced, since that's what repeatable cte means.

but 

{code:sql}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2)
{code}

has no expectation of randomness, so reducing it is valid either way.

Now, if we decide that it makes things simpler to handle by treating the 
"arbitrary" queries like LIMIT the same as "explcit random" queries, that could 
be a valid decision we could make. 

And I still am confused why anyone would want to make sure it doesn't reduce 
this case. In many cases, this query would return the same result anyway, e.g. 
when the table is small, it's likely many engines would start reading from the 
beginning of the underlying file. If the goal is to force some 
randomness/arbitrariness into the execution engine, this isn't the way to do it.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074377#comment-18074377
 ] 

Julian Hyde commented on CALCITE-7463:
--

[~jensen], I would agree if we were talking about {{UNION ALL}}. But if the two 
{{LIMIT 2}} subqueries produce the same rows, the {{UNION}} will produce the 
same rows.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074376#comment-18074376
 ] 

Zhen Chen commented on CALCITE-7463:


[~julianhyde] Thank you. You always provide solutions when problems arise, 
rather than letting them fall into endless loops. I agree that supporting a 
"repeatable" mode would be more rigorous. However, the example happens to 
demonstrate a special case. The execution result of the current example is 
actually similar to a "repeatable" scenario, but the rewritten result is not 
consistent—no matter what number follows the "limit" in "select distinct...". 
Nevertheless, I agree that we could adopt a "repeatable mode".

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074373#comment-18074373
 ] 

Zhen Chen commented on CALCITE-7463:


[~sphillips] Actually, whether it's "random" or "nondeterministic," what I want 
to express is: what's the point of optimizing such a situation? Wouldn't it be 
better to preserve the original uncertainty of an "uncertain" scenario? The 
logic that "{*}nondeterministic"{*} + "nondeterministic" = a subset of 
"{*}nondeterministic"{*} doesn't hold, in my view.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074311#comment-18074311
 ] 

Julian Hyde commented on CALCITE-7463:
--

I agree that there are both *nondeterministic* operations and truly *random* 
operations, but for these purposes we really care whether a data set is 
*repeatable*.

In some databases (such as BigQuery), CTEs are repeatable. If I write
{code}
with T as 
select * from T
except
select * from T
{code}

Calcite's default behavior for CTEs is to inline them (as if they were views), 
so there is some chance that the two executions will produce different rows. 
(In CALCITE-6716, I proposed a {{REPEATABLE}} keyword to explicitly declare 
whether a CTE is repeatable.)

I think we need to parameterize the environment executing planner rules with a 
boolean mode. The mode determines whether two RelNodes that are deeply-equal 
are considered to be repeatable. It is valid to collapse
{code}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2) 
{code}
into
{code}SELECT DISTINCT mgr, comm FROM emp LIMIT 2{code}
if and only if the mode is true.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Steven Phillips (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074286#comment-18074286
 ] 

Steven Phillips commented on CALCITE-7463:
--

You should be careful about how using the word random. In statistics, random 
means something.

The result of the query:

{code:sql}
(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2)
{code}

is, in general, not random, but rather non-deterministic. Meaning that, within 
the space of all possible correct results, there is no expectation of which 
result will be returned. And this could be implementation specific.

The RANDOM() function, on the other hand, implies statistical randomness. The 
expectation is that each instance of this function will potentially return a 
different result, and the probability of any specific result is given according 
to a specified probability distribution (in this case uniform). 

In your random() * random() example, the probability distribution of the 
product is not uniform over the interval [0,1), so the two expressions are not 
equivalent.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-17 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074232#comment-18074232
 ] 

Zhen Chen commented on CALCITE-7463:


Sorry for delay reply. Please see this case.
CREATE TABLE t (
  id INTEGER,
  name VARCHAR(10)
);
​
INSERT INTO t VALUES (1, 'a');
INSERT INTO t VALUES (1, 'b');
INSERT INTO t VALUES (1, 'c');
INSERT INTO t VALUES (2, 'd');
INSERT INTO t VALUES (2, 'e');
INSERT INTO t VALUES (3, 'f');
INSERT INTO t VALUES (3, 'g');
INSERT INTO t VALUES (3, 'h');
INSERT INTO t VALUES (3, 'i');
​
(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2);
--  id 
-- 
--   1
-- (1 row)
​
select distinct id from t limit 2;
--  id 
-- 
--   3
--   2
-- (2 rows)
1. UNION with LIMIT in subqueries:

  Each subquery's {{LIMIT }}executes first → may get duplicate rows

  {{UNION }}removes duplicates afterward → final result may have fewer rows

2. DISTINCT with LIMIT:

  {{DISTINCT }}executes first → gets unique values

  {{LIMIT }}applies to distinct results → stable row count

Execution order of operations causes different outcomes. Please note that if 
table t is not strictly ordered, the result of the first SQL query will be 
unstable, possibly returning 1 to 3 rows, while the result of the second SQL 
query is stable with 2 rows.

If we can accept this random scenario, meaning the logic that "random" + 
"random" = "random" holds true, let's look at the following example:
select id from t limit 10 * random();
select id from t limit 10 * random() * random();
Are these two SQL statements equivalent?

If the above equation holds, I believe these two SQL statements are equivalent. 
Is my conclusion wrong?

If the logic that "random" + "random" = "random" is incorrect, can the 
rewriting of UNION be considered wrong?

Is it reasonable to add "uncertainty" on top of "uncertainty"?

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-14 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073404#comment-18073404
 ] 

Zhen Chen commented on CALCITE-7463:


I recall that we fixed an issue about whether rand() can be pushed down. I'll 
look for the Jira ticket later. So, if we consider that "random" + "random" = 
"random" is valid, can rand() be freely pulled up or pushed down?

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-14 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073399#comment-18073399
 ] 

Alessandro Solimando commented on CALCITE-7463:
---

[~sphillips], another example involving an ORDER BY was proposed here, but we 
it's still not clear to me if this one is problematic either. I agree with your 
comment, though, as long as we pick one legal alternative and we reduce cost, 
the planner it's doing its job.

Thanks [~zabetak] for pointing to CALCITE-4160, which has a lot of interesting 
details.

I re-took a look at the rule details, and I am pretty convinced that there is 
no issue with this rule, as per the SQL standard, unless we can find a concrete 
counter-example, I think we can close with "Not a bug" for now and revisit if 
any problem surfaces later, wdyt [~jensen]?

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-14 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073365#comment-18073365
 ] 

Stamatis Zampetakis commented on CALCITE-7463:
--

As others mentioned already, the example query is non deterministic so we 
cannot really say that transformation is invalid. I would also be curious to 
understand better the motivation behind this change. Which use-case will 
benefit from the "fix"?

Another point to keep in mind is that ORDER BY and LIMIT in sub-queries are 
optional features from a SQL standard perspective so some systems may not even 
allow such constructs (see CALCITE-4160 for more details around ORDER BY).

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-13 Thread Steven Phillips (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073355#comment-18073355
 ] 

Steven Phillips commented on CALCITE-7463:
--

I'm still of the opinion that this rewrite is safe, and I haven't seen any 
examples where this rewrite gives an incorrect result. The fact that the 
rewrite narrows the space of possible correct results to a smaller space of 
correct results does not make the results incorrect. Part of my reasoning is 
based on the fact that, given a LIMIT without an ORDER BY, there are no 
expectations about which rows get returned, so the optimizer making a choice to 
return rows in a way that satisfy the limit but also reduce the cost of the 
query is in fact a good thing.

I want to know what the real problem we're trying to solve is. Just one example 
where this rule results in an incorrect result, or even undesirable plan.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-13 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073345#comment-18073345
 ] 

Zhen Chen commented on CALCITE-7463:


[~asolimando] Thank you for raising this point. Regarding the boundary issue, 
I’d like to double-check: if all subqueries in the UNION are of the form like
{{{}SELECT mgr FROM emp ORDER BY mgr LIMIT 2{}}}, I believe rewriting is 
possible.
Does the {{isRowLimited }}flag alone not sufficiently determine this?

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-13 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073297#comment-18073297
 ] 

Alessandro Solimando commented on CALCITE-7463:
---

Summarizing my thoughts after the discussion here and reviewing the PR.

I agree that LIMIT without ORDER BY is nondeterministic. However, collapsing 
two independently-evaluated LIMIT 2 branches into a single one strictly narrows 
the set of possible outcomes (at most 2 rows instead of up to 4 before dedup).
Being conservative as [~jensen] suggests seems safer to me. 

If we agree on this, I suggest we introduce a new metadata property (e.g., 
{_}isRowLimited{_}), capturing whether a subtree depends on row-limiting 
operations, and use that to detect and bail out on the pattern.

Re. what Julian says around CTEs: this seems strongly related to [~zabetak]'s 
work on {_}RelCommonExpressionSuggester{_}, I wonder if he has given the topic 
a thought. IIRC _RelCommonExpressionBasicSuggester_ considers _deepEquals_ 
subtrees as identical and replaceable with a single CTE. Would it lead to the 
same situation we have now?

Since Julian cites BigQuery as an example where this rewrite would be legal, 
but there are potentially others where it is not, maybe we should have a 
"method" in an interface, that downstream systems can override to decide, given 
two (identical) expressions, if they are the same or not?

At that point the rule would use that deciding point and bail out for cases 
when the collapsing wouldn't be safe.

WDYT?

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-07 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071650#comment-18071650
 ] 

Zhen Chen commented on CALCITE-7463:


This rule actually leverages Calcite operator’s deepEquals for comparison—two 
subexpressions are considered to have the same origin as long as they are 
structurally identical. Based on this "same origin" assumption, rewrite 
optimizations are applied. Before the rewrite, there are two separate sub-plans 
(if there are two SCANs, they can execute independently), but after the 
rewrite, the two identical sub-plans are merged into one.  

Regarding the behavior you mentioned—similar to CTE—I’m not certain how we 
should define or enforce it. Perhaps we can establish a specification, but I’m 
unsure how to design such a rewrite rule in a way that everyone would find 
reasonable and consistent.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-06 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071595#comment-18071595
 ] 

Julian Hyde commented on CALCITE-7463:
--

I wonder whether the rule is relying on the fact that "SELECT mgr, comm FROM 
emp LIMIT 2" is a common relational sub-expression. This is similar to a CTE, 
and I know some systems (e.g. BigQuery) assume that each reference to a CTE 
produces the same rows in the same order.

I don't think we ever established how Calcite should treat common relational 
sub-expressions; maybe it's time to do so.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-06 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071564#comment-18071564
 ] 

Zhen Chen commented on CALCITE-7463:


Yes, relying on the physical ordering of data doesn't seem particularly 
convincing, as the scanning process itself could be parallelized. The scenario 
I was likely trying to describe is the following—though I'm not sure if this 
specific issue actually arises in practice: a single `SORT` operation can be 
followed by multiple `PROJECT` operations, and each `PROJECT` can, in turn, be 
followed by multiple `SORT` operations. During the rule rewriting phase, the 
system might not recurse deeply enough to fully inspect and account for this 
specific structure. The plan might look something like this.
{code:java}
LogicalUnion(all=[false])
  LogicalSort(fetch=[2])
LogicalProject(MGR=[$3], COMM=[$6])
     LogicalSort(sort0=[$3], dir0=[ASC])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalSort(fetch=[2])
LogicalProject(MGR=[$3], COMM=[$6])
  LogicalSort(sort0=[$3], dir0=[ASC])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])  {code}

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-06 Thread Mihai Budiu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071556#comment-18071556
 ] 

Mihai Budiu commented on CALCITE-7463:
--

I don't expect that SQL guarantees a scan order for a table.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-06 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071555#comment-18071555
 ] 

Zhen Chen commented on CALCITE-7463:


I haven’t figured out how to give an example using SQL. The scenario is like 
this: There’s a table t created with the statement 

"CREATE TABLE t (id INT, name VARCHAR(10))", and its physical data is stored in 
order of id. In such a scenario, performing a rewrite would definitely be 
incorrect. If we don’t consider that scenario, there’s a slightly more 
complicated situation: during query optimization, the sorting property from 
deeper levels of UNION subclauses can be propagated upward. However, during 
plan rewriting, I can only recognize LIMIT and cannot match the SORT at deeper 
levels. That’s what I’m concerned about.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-06 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071493#comment-18071493
 ] 

Julian Hyde commented on CALCITE-7463:
--

I have concerns similar to [~sphillips]'s, and I would like to see cases that 
are more clearly wrong. The first query is nondeterministic (because of LIMIT 
without ORDER BY) but the rewritten query yields one possible result.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-06 Thread Steven Phillips (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071484#comment-18071484
 ] 

Steven Phillips commented on CALCITE-7463:
--

My point is, in general, if we have 
()
union
()

where subquery is identical in each branch of the union, as long as the 
subquery doesn't contain non-deterministic functions like RANDOM, it should be 
safe to reduce this to just the subquery.

In this specific case, leaving LIMIT 2 is valid, since the two rows are the 
same two rows in each input to the union. This is because relations in sql 
don't have an ordering, so any ordering the planner chooses for unordered limit 
is valid. 

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-04 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071007#comment-18071007
 ] 

Zhen Chen commented on CALCITE-7463:


The current rewritten query has a  LIMIT 2 , but it should be a  LIMIT 4 . This 
is incorrect. Additionally, we cannot determine whether the child of the LIMIT 
operator, or its descendants, are already ordered. Therefore, the safest 
approach is to avoid rewriting it altogether.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-04 Thread Steven Phillips (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070997#comment-18070997
 ] 

Steven Phillips commented on CALCITE-7463:
--

Just based on this example, I'm not convinced this is a bug. Since limit 
without an order by is ambiguous, it seems to me that it would be perfectly 
valid for the planner to create a plan that returns the same 2 rows for each 
input to the union, which makes the union redundant. 

Are there other examples that are more clearly wrong when applying this rule? 
e.g. if you have different filters in each branch:


{code:sql}
(SELECT mgr, comm FROM emp WHERE deptno = 10 LIMIT 2)
UNION
(SELECT mgr, comm FROM emp WHERE deptno = 20 LIMIT 2)
{code}

This should not be rewritten.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT

2026-04-03 Thread Zhen Chen (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070965#comment-18070965
 ] 

Zhen Chen commented on CALCITE-7463:


We can resolve this issue by ensuring that the clauses within a SetOp do not 
begin with a Sort operation. It would be great if someone wanted to take on 
this issue; otherwise, if no one attempts it within a week, I will resolve it 
myself.

> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs 
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
>   LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains 
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET). 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)