[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074929#comment-18074929
]
Julian Hyde commented on CALCITE-7463:
--
My view is that the planner could run in an ARBITRARY_REPEATABLE mode where two
expressions that are arbitrary and structurally equal are considered to be
equivalent (i.e. always return the same results). In that mode, rewrites such
as the above would be valid.
If a rule needs ARBITRARY_REPEATABLE, it would have to declare it. The rule
would not fire in a planner where ARBITRARY_REPEATABLE is false.
We would need to classify expressions into deterministic, arbitrary, random.
(We might need to refine the distinction because an expression like 'select *
from emp order by deptno' has deterministic rows but arbitrary ordering;
'select * from emp order by deptno limit 3' has arbitrary rows and ordering.)
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074837#comment-18074837
]
Zhen Chen commented on CALCITE-7463:
My view is that we shouldn't pile one uncertainty on top of another. If we
cannot reach a consensus, I think we should simply set this issue aside for the
time being—because, in my opinion, this kind of optimization is pointless.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074386#comment-18074386
]
Steven Phillips commented on CALCITE-7463:
--
That's actually not quite what I'm saying. The fact that it probably does it
anyway is not the foundation of my argument as to why it's correct. My argument
is basically:
Limit without order by means: return two rows, it doesn't matter which ones or
what method is used to choose them.
Therefore, choosing the two rows for the subqueries such that they return the
exact same two rows in each instance is a valid optimizer choice, as it meets
the requirements of the request.
With the random() function, there is a different meaning. It doesn't mean give
me any value, I don't care what or how you decided. It explicitly means, give
me a value with random probability over a uniform distribution. So, while it
could be any value, I do care how it's done, or at least I expect the result to
be generated in a way that is (pseudo)random.
Another analogy would be the ANY aggregate function, which some engines
support. It's arbitrary, but not random, so it would be valid to transform the
query:
{code:sql}
select any(x) from t
{code}
to
{code:sql}
select min(x) from t
{code}
whereas:
{code}
select x from t order by random() limit 1
{code}
would not be valid to transform this way, since the random() function has the
expectation of (pseudo)randomness.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074382#comment-18074382
]
Julian Hyde commented on CALCITE-7463:
--
I see. You're saying that we are allowed to "cook the books" and assume that
the two scans return rows in the same order - because it's PROBABLY what would
happen anyway.
The argument about nondeterministic vs random gets rather philosophical, so
could we couch it in terms of probability? Namely, it's not valid to optimize
'select random() except select random()' to the empty relation because, while
it's possible that both calls to random() return the same number, it's unlikely.
Or I guess we could say that the order of scans is *arbitrary* and therefore we
*choose* that they will return the same.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074378#comment-18074378
]
Steven Phillips commented on CALCITE-7463:
--
The point of optimizing queries is for them to perform faster with less
resources. It's the goal of a query optimizer. And the unoptimized query in
this case could potentially be very expensive, e.g. if the limit is much
higher, and there are more than just 2 instances. And I'm generally not a fan
of removing optimizations that give correct results.
My point is, in this particular case, it is entirely, 100% up to the optimizer
to decide what to return. When I request 2 arbitrary rows from a relation,
without specifying an order by, it is valid for the optimizer to choose any
plan whatsoever that satisfies that request. If it decides that a plan that
returns the exact same result set every time is the best plan, that is
perfectly fine. If the planner decides some other plan, which potentially
returns different results each time is best, that's also valid. The planner can
also choose a plan that returns the same result every time in some cases, and a
plan that returns different results every time in other cases, and that's still
fine. The only requirement is that planner gives a plan that returns results
that are consistent with what the query requests.
So I disagree with [~julianhyde] slightly, in that, I think this optimization
is valid if we don't consider CTEs/subqueries to be repeatable, which is
different from how we should handle the RANDOM() function, as RANDOM() comes
with an expectation of randomness, whereas LIMIT means give me any result that
matches, I don't care how.
In other words, I think
{code:sql}
select random()
union
select random()
{code}
should not be reduced, but if we treat it as a REPEATABLE CTE like Julian is
suggesting, then it could be reduced, since that's what repeatable cte means.
but
{code:sql}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2)
{code}
has no expectation of randomness, so reducing it is valid either way.
Now, if we decide that it makes things simpler to handle by treating the
"arbitrary" queries like LIMIT the same as "explcit random" queries, that could
be a valid decision we could make.
And I still am confused why anyone would want to make sure it doesn't reduce
this case. In many cases, this query would return the same result anyway, e.g.
when the table is small, it's likely many engines would start reading from the
beginning of the underlying file. If the goal is to force some
randomness/arbitrariness into the execution engine, this isn't the way to do it.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074377#comment-18074377
]
Julian Hyde commented on CALCITE-7463:
--
[~jensen], I would agree if we were talking about {{UNION ALL}}. But if the two
{{LIMIT 2}} subqueries produce the same rows, the {{UNION}} will produce the
same rows.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074376#comment-18074376
]
Zhen Chen commented on CALCITE-7463:
[~julianhyde] Thank you. You always provide solutions when problems arise,
rather than letting them fall into endless loops. I agree that supporting a
"repeatable" mode would be more rigorous. However, the example happens to
demonstrate a special case. The execution result of the current example is
actually similar to a "repeatable" scenario, but the rewritten result is not
consistent—no matter what number follows the "limit" in "select distinct...".
Nevertheless, I agree that we could adopt a "repeatable mode".
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074373#comment-18074373
]
Zhen Chen commented on CALCITE-7463:
[~sphillips] Actually, whether it's "random" or "nondeterministic," what I want
to express is: what's the point of optimizing such a situation? Wouldn't it be
better to preserve the original uncertainty of an "uncertain" scenario? The
logic that "{*}nondeterministic"{*} + "nondeterministic" = a subset of
"{*}nondeterministic"{*} doesn't hold, in my view.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074311#comment-18074311
]
Julian Hyde commented on CALCITE-7463:
--
I agree that there are both *nondeterministic* operations and truly *random*
operations, but for these purposes we really care whether a data set is
*repeatable*.
In some databases (such as BigQuery), CTEs are repeatable. If I write
{code}
with T as
select * from T
except
select * from T
{code}
Calcite's default behavior for CTEs is to inline them (as if they were views),
so there is some chance that the two executions will produce different rows.
(In CALCITE-6716, I proposed a {{REPEATABLE}} keyword to explicitly declare
whether a CTE is repeatable.)
I think we need to parameterize the environment executing planner rules with a
boolean mode. The mode determines whether two RelNodes that are deeply-equal
are considered to be repeatable. It is valid to collapse
{code}
(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2)
{code}
into
{code}SELECT DISTINCT mgr, comm FROM emp LIMIT 2{code}
if and only if the mode is true.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074286#comment-18074286
]
Steven Phillips commented on CALCITE-7463:
--
You should be careful about how using the word random. In statistics, random
means something.
The result of the query:
{code:sql}
(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2)
{code}
is, in general, not random, but rather non-deterministic. Meaning that, within
the space of all possible correct results, there is no expectation of which
result will be returned. And this could be implementation specific.
The RANDOM() function, on the other hand, implies statistical randomness. The
expectation is that each instance of this function will potentially return a
different result, and the probability of any specific result is given according
to a specified probability distribution (in this case uniform).
In your random() * random() example, the probability distribution of the
product is not uniform over the interval [0,1), so the two expressions are not
equivalent.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074232#comment-18074232
]
Zhen Chen commented on CALCITE-7463:
Sorry for delay reply. Please see this case.
CREATE TABLE t (
id INTEGER,
name VARCHAR(10)
);
INSERT INTO t VALUES (1, 'a');
INSERT INTO t VALUES (1, 'b');
INSERT INTO t VALUES (1, 'c');
INSERT INTO t VALUES (2, 'd');
INSERT INTO t VALUES (2, 'e');
INSERT INTO t VALUES (3, 'f');
INSERT INTO t VALUES (3, 'g');
INSERT INTO t VALUES (3, 'h');
INSERT INTO t VALUES (3, 'i');
(SELECT id FROM t limit 2)
UNION
(SELECT id FROM t limit 2);
-- id
--
-- 1
-- (1 row)
select distinct id from t limit 2;
-- id
--
-- 3
-- 2
-- (2 rows)
1. UNION with LIMIT in subqueries:
Each subquery's {{LIMIT }}executes first → may get duplicate rows
{{UNION }}removes duplicates afterward → final result may have fewer rows
2. DISTINCT with LIMIT:
{{DISTINCT }}executes first → gets unique values
{{LIMIT }}applies to distinct results → stable row count
Execution order of operations causes different outcomes. Please note that if
table t is not strictly ordered, the result of the first SQL query will be
unstable, possibly returning 1 to 3 rows, while the result of the second SQL
query is stable with 2 rows.
If we can accept this random scenario, meaning the logic that "random" +
"random" = "random" holds true, let's look at the following example:
select id from t limit 10 * random();
select id from t limit 10 * random() * random();
Are these two SQL statements equivalent?
If the above equation holds, I believe these two SQL statements are equivalent.
Is my conclusion wrong?
If the logic that "random" + "random" = "random" is incorrect, can the
rewriting of UNION be considered wrong?
Is it reasonable to add "uncertainty" on top of "uncertainty"?
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073404#comment-18073404
]
Zhen Chen commented on CALCITE-7463:
I recall that we fixed an issue about whether rand() can be pushed down. I'll
look for the Jira ticket later. So, if we consider that "random" + "random" =
"random" is valid, can rand() be freely pulled up or pushed down?
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073399#comment-18073399
]
Alessandro Solimando commented on CALCITE-7463:
---
[~sphillips], another example involving an ORDER BY was proposed here, but we
it's still not clear to me if this one is problematic either. I agree with your
comment, though, as long as we pick one legal alternative and we reduce cost,
the planner it's doing its job.
Thanks [~zabetak] for pointing to CALCITE-4160, which has a lot of interesting
details.
I re-took a look at the rule details, and I am pretty convinced that there is
no issue with this rule, as per the SQL standard, unless we can find a concrete
counter-example, I think we can close with "Not a bug" for now and revisit if
any problem surfaces later, wdyt [~jensen]?
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073365#comment-18073365
]
Stamatis Zampetakis commented on CALCITE-7463:
--
As others mentioned already, the example query is non deterministic so we
cannot really say that transformation is invalid. I would also be curious to
understand better the motivation behind this change. Which use-case will
benefit from the "fix"?
Another point to keep in mind is that ORDER BY and LIMIT in sub-queries are
optional features from a SQL standard perspective so some systems may not even
allow such constructs (see CALCITE-4160 for more details around ORDER BY).
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073355#comment-18073355
]
Steven Phillips commented on CALCITE-7463:
--
I'm still of the opinion that this rewrite is safe, and I haven't seen any
examples where this rewrite gives an incorrect result. The fact that the
rewrite narrows the space of possible correct results to a smaller space of
correct results does not make the results incorrect. Part of my reasoning is
based on the fact that, given a LIMIT without an ORDER BY, there are no
expectations about which rows get returned, so the optimizer making a choice to
return rows in a way that satisfy the limit but also reduce the cost of the
query is in fact a good thing.
I want to know what the real problem we're trying to solve is. Just one example
where this rule results in an incorrect result, or even undesirable plan.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073345#comment-18073345
]
Zhen Chen commented on CALCITE-7463:
[~asolimando] Thank you for raising this point. Regarding the boundary issue,
I’d like to double-check: if all subqueries in the UNION are of the form like
{{{}SELECT mgr FROM emp ORDER BY mgr LIMIT 2{}}}, I believe rewriting is
possible.
Does the {{isRowLimited }}flag alone not sufficiently determine this?
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18073297#comment-18073297
]
Alessandro Solimando commented on CALCITE-7463:
---
Summarizing my thoughts after the discussion here and reviewing the PR.
I agree that LIMIT without ORDER BY is nondeterministic. However, collapsing
two independently-evaluated LIMIT 2 branches into a single one strictly narrows
the set of possible outcomes (at most 2 rows instead of up to 4 before dedup).
Being conservative as [~jensen] suggests seems safer to me.
If we agree on this, I suggest we introduce a new metadata property (e.g.,
{_}isRowLimited{_}), capturing whether a subtree depends on row-limiting
operations, and use that to detect and bail out on the pattern.
Re. what Julian says around CTEs: this seems strongly related to [~zabetak]'s
work on {_}RelCommonExpressionSuggester{_}, I wonder if he has given the topic
a thought. IIRC _RelCommonExpressionBasicSuggester_ considers _deepEquals_
subtrees as identical and replaceable with a single CTE. Would it lead to the
same situation we have now?
Since Julian cites BigQuery as an example where this rewrite would be legal,
but there are potentially others where it is not, maybe we should have a
"method" in an interface, that downstream systems can override to decide, given
two (identical) expressions, if they are the same or not?
At that point the rule would use that deciding point and bail out for cases
when the collapsing wouldn't be safe.
WDYT?
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071650#comment-18071650
]
Zhen Chen commented on CALCITE-7463:
This rule actually leverages Calcite operator’s deepEquals for comparison—two
subexpressions are considered to have the same origin as long as they are
structurally identical. Based on this "same origin" assumption, rewrite
optimizations are applied. Before the rewrite, there are two separate sub-plans
(if there are two SCANs, they can execute independently), but after the
rewrite, the two identical sub-plans are merged into one.
Regarding the behavior you mentioned—similar to CTE—I’m not certain how we
should define or enforce it. Perhaps we can establish a specification, but I’m
unsure how to design such a rewrite rule in a way that everyone would find
reasonable and consistent.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071595#comment-18071595
]
Julian Hyde commented on CALCITE-7463:
--
I wonder whether the rule is relying on the fact that "SELECT mgr, comm FROM
emp LIMIT 2" is a common relational sub-expression. This is similar to a CTE,
and I know some systems (e.g. BigQuery) assume that each reference to a CTE
produces the same rows in the same order.
I don't think we ever established how Calcite should treat common relational
sub-expressions; maybe it's time to do so.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071564#comment-18071564
]
Zhen Chen commented on CALCITE-7463:
Yes, relying on the physical ordering of data doesn't seem particularly
convincing, as the scanning process itself could be parallelized. The scenario
I was likely trying to describe is the following—though I'm not sure if this
specific issue actually arises in practice: a single `SORT` operation can be
followed by multiple `PROJECT` operations, and each `PROJECT` can, in turn, be
followed by multiple `SORT` operations. During the rule rewriting phase, the
system might not recurse deeply enough to fully inspect and account for this
specific structure. The plan might look something like this.
{code:java}
LogicalUnion(all=[false])
LogicalSort(fetch=[2])
LogicalProject(MGR=[$3], COMM=[$6])
LogicalSort(sort0=[$3], dir0=[ASC])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalSort(fetch=[2])
LogicalProject(MGR=[$3], COMM=[$6])
LogicalSort(sort0=[$3], dir0=[ASC])
LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071556#comment-18071556
]
Mihai Budiu commented on CALCITE-7463:
--
I don't expect that SQL guarantees a scan order for a table.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071555#comment-18071555
]
Zhen Chen commented on CALCITE-7463:
I haven’t figured out how to give an example using SQL. The scenario is like
this: There’s a table t created with the statement
"CREATE TABLE t (id INT, name VARCHAR(10))", and its physical data is stored in
order of id. In such a scenario, performing a rewrite would definitely be
incorrect. If we don’t consider that scenario, there’s a slightly more
complicated situation: during query optimization, the sorting property from
deeper levels of UNION subclauses can be propagated upward. However, during
plan rewriting, I can only recognize LIMIT and cannot match the SORT at deeper
levels. That’s what I’m concerned about.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071493#comment-18071493
]
Julian Hyde commented on CALCITE-7463:
--
I have concerns similar to [~sphillips]'s, and I would like to see cases that
are more clearly wrong. The first query is nondeterministic (because of LIMIT
without ORDER BY) but the rewritten query yields one possible result.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071484#comment-18071484
]
Steven Phillips commented on CALCITE-7463:
--
My point is, in general, if we have
()
union
()
where subquery is identical in each branch of the union, as long as the
subquery doesn't contain non-deterministic functions like RANDOM, it should be
safe to reduce this to just the subquery.
In this specific case, leaving LIMIT 2 is valid, since the two rows are the
same two rows in each input to the union. This is because relations in sql
don't have an ordering, so any ordering the planner chooses for unordered limit
is valid.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18071007#comment-18071007
]
Zhen Chen commented on CALCITE-7463:
The current rewritten query has a LIMIT 2 , but it should be a LIMIT 4 . This
is incorrect. Additionally, we cannot determine whether the child of the LIMIT
operator, or its descendants, are already ordered. Therefore, the safest
approach is to avoid rewriting it altogether.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070997#comment-18070997
]
Steven Phillips commented on CALCITE-7463:
--
Just based on this example, I'm not convinced this is a bug. Since limit
without an order by is ambiguous, it seems to me that it would be perfectly
valid for the planner to create a plan that returns the same 2 rows for each
input to the union, which makes the union redundant.
Are there other examples that are more clearly wrong when applying this rule?
e.g. if you have different filters in each branch:
{code:sql}
(SELECT mgr, comm FROM emp WHERE deptno = 10 LIMIT 2)
UNION
(SELECT mgr, comm FROM emp WHERE deptno = 20 LIMIT 2)
{code}
This should not be rewritten.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[jira] [Commented] (CALCITE-7463) UnionToFilterRule incorrectly rewrites UNION with LIMIT
[
https://issues.apache.org/jira/browse/CALCITE-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070965#comment-18070965
]
Zhen Chen commented on CALCITE-7463:
We can resolve this issue by ensuring that the clauses within a SetOp do not
begin with a Sort operation. It would be great if someone wanted to take on
this issue; otherwise, if no one attempts it within a week, I will resolve it
myself.
> UnionToFilterRule incorrectly rewrites UNION with LIMIT
> ---
>
> Key: CALCITE-7463
> URL: https://issues.apache.org/jira/browse/CALCITE-7463
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.41.0
>Reporter: Zhen Chen
>Priority: Major
> Fix For: 1.42.0
>
>
> The {{UnionToFilterRule}} produces incorrect results when applied to inputs
> that contain {{{}LIMIT{}}}.
> Specifically, the rule incorrectly collapses:
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> into:
> {code:java}
> SELECT DISTINCT mgr, comm FROM emp LIMIT 2 {code}
> This transformation is {*}not semantically equivalent{*}.
> *Reproduction*
> SQL
> {code:java}
> (SELECT mgr, comm FROM emp LIMIT 2)
> UNION
> (SELECT mgr, comm FROM emp LIMIT 2) {code}
> h4. Plan Before
> {code:java}
> LogicalUnion(all=[false])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Plan After (Incorrect)*
> {code:java}
> LogicalAggregate(group=[{0, 1}])
> LogicalSort(fetch=[2])
> LogicalProject(MGR=[$3], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> *Expected Behavior*
> The transformation should NOT be applied when any input of UNION contains
> LogicalSort(That contains ORDER BY, LIMIT, OFFSET).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
