[
https://issues.apache.org/jira/browse/SPARK-47527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruce Robbins updated SPARK-47527:
--
Description:
For example:
{noformat}
create or replace temp view v1 as
select id from range(10);
create or replace temp view q1 as
select * from v1
where id between 2 and 4;
cache table q1;
explain select * from q1;
== Physical Plan ==
*(1) Filter ((id#51L >= 2) AND (id#51L <= 4))
+- *(1) Range (0, 10, step=1, splits=8)
{noformat}
Similarly:
{noformat}
create or replace temp view q2 as
select count_if(id > 3) as cnt
from v1;
cache table q2;
explain select * from q2;
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[count(if (NOT _common_expr_0#88) null else
_common_expr_0#88)])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=182]
+- HashAggregate(keys=[], functions=[partial_count(if (NOT
_common_expr_0#88) null else _common_expr_0#88)])
+- Project [(id#86L > 3) AS _common_expr_0#88]
+- Range (0, 10, step=1, splits=8)
{noformat}
In the output of the above explain commands, neither include an
{{InMemoryRelation}} node.
The culprit seems to be the common expression ids in the {{With}} expressions
used in runtime replacements for {{between}} and {{{}count_if{}}}, e.g. [this
code|https://github.com/apache/spark/blob/39500a315166d8e342b678ef3038995a03ce84d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala#L43].
was:
For example:
{noformat}
create or replace temp view v1 as
select id from range(10);
create or replace temp view q1 as
select * from v1
where id between 2 and 4;
cache table q1;
explain select * from q1;
== Physical Plan ==
*(1) Filter ((id#51L >= 2) AND (id#51L <= 4))
+- *(1) Range (0, 10, step=1, splits=8)
{noformat}
Similarly:
{noformat}
create or replace temp view q2 as
select count_if(id > 3) as cnt
from v1;
cache table q2;
explain select * from q2;
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[count(if (NOT _common_expr_0#88) null else
_common_expr_0#88)])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=182]
+- HashAggregate(keys=[], functions=[partial_count(if (NOT
_common_expr_0#88) null else _common_expr_0#88)])
+- Project [(id#86L > 3) AS _common_expr_0#88]
+- Range (0, 10, step=1, splits=8)
{noformat}
In the output of the above explain commands, neither list an
{{InMemoryRelation}} node.
The culprit seems to be the common expression ids in the {{With}} expressions
used in runtime replacements for {{between}} and {{count_if}}, e.g. [this
code|https://github.com/apache/spark/blob/39500a315166d8e342b678ef3038995a03ce84d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala#L43].
> Cache miss for queries using With expressions
> -
>
> Key: SPARK-47527
> URL: https://issues.apache.org/jira/browse/SPARK-47527
> Project: Spark
> Issue Type: Bug
> Components: SQL
>Affects Versions: 4.0.0
>Reporter: Bruce Robbins
>Priority: Major
>
> For example:
> {noformat}
> create or replace temp view v1 as
> select id from range(10);
> create or replace temp view q1 as
> select * from v1
> where id between 2 and 4;
> cache table q1;
> explain select * from q1;
> == Physical Plan ==
> *(1) Filter ((id#51L >= 2) AND (id#51L <= 4))
> +- *(1) Range (0, 10, step=1, splits=8)
> {noformat}
> Similarly:
> {noformat}
> create or replace temp view q2 as
> select count_if(id > 3) as cnt
> from v1;
> cache table q2;
> explain select * from q2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[], functions=[count(if (NOT _common_expr_0#88) null
> else _common_expr_0#88)])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=182]
> +- HashAggregate(keys=[], functions=[partial_count(if (NOT
> _common_expr_0#88) null else _common_expr_0#88)])
> +- Project [(id#86L > 3) AS _common_expr_0#88]
> +- Range (0, 10, step=1, splits=8)
> {noformat}
> In the output of the above explain commands, neither include an
> {{InMemoryRelation}} node.
> The culprit seems to be the common expression ids in the {{With}} expressions
> used in runtime replacements for {{between}} and {{{}count_if{}}}, e.g. [this
> code|https://github.com/apache/spark/blob/39500a315166d8e342b678ef3038995a03ce84d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala#L43].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org