[ https://issues.apache.org/jira/browse/SPARK-26781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26781. ---------------------------------- Resolution: Duplicate > Additional exchange gets added > ------------------------------- > > Key: SPARK-26781 > URL: https://issues.apache.org/jira/browse/SPARK-26781 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Karuppayya > Priority: Major > > Consider three tables: a(id int), b(id int), c(id, int) > query: > {code:java} > select * from (select a.id as newid from a join b where a.id = b.id) temp > join c on temp.newid = c.id > {code} > Plan(physical plan: > org.apache.spark.sql.execution.QueryExecution#executedPlan): > > {noformat} > *(9) SortMergeJoin [newid#1L], [id#6L], Inner > :- *(6) Sort [newid#1L ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(newid#1L, 200) > : +- *(5) Project [id#2L AS newid#1L, name#3] > : +- *(5) SortMergeJoin [id#2L], [id#4L], Inner > : :- *(2) Sort [id#2L ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(id#2L, 200) > : : +- *(1) Project [id#2L, name#3] > : : +- *(1) Filter isnotnull(id#2L) > : : +- *(1) FileScan parquet a[id#2L,name#3] Batched: true, DataFilters: > [isnotnull(id#2L)], Format: Parquet, Location: > InMemoryFileIndex[file:/tmp/spark/a], PartitionFilters: [], PushedFilters: > [IsNotNull(id)], ReadSchema: struct<id:bigint,name:string> > : +- *(4) Sort [id#4L ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(id#4L, 200) > : +- *(3) Project [id#4L] > : +- *(3) Filter isnotnull(id#4L) > : +- *(3) FileScan parquet b[id#4L] Batched: true, DataFilters: > [isnotnull(id#4L)], Format: Parquet, Location: > InMemoryFileIndex[file:/tmp/spark/b], PartitionFilters: [], PushedFilters: > [IsNotNull(id)], ReadSchema: struct<id:bigint> > +- *(8) Sort [id#6L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(id#6L, 200) > +- *(7) Project [id#6L, name#7] > +- *(7) Filter isnotnull(id#6L) > +- *(7) FileScan parquet \c[id#6L,name#7] Batched: true, DataFilters: > [isnotnull(id#6L)], Format: Parquet, Location: > InMemoryFileIndex[file:/tmp/spark/c], PartitionFilters: [], PushedFilters: > [IsNotNull(id)], ReadSchema: struct<id:bigint,name:string>{noformat} > > The exchange operator below stage 6 is not required since the data from > project is already partitioned based on id. > An exchange gets added since the outputPartitioning of Project(5) is > HashPartitioning on id#2L whereas the requiredPartitioning of Sort(Stage 6) > is HashPartitioning on newid#1L which is nothing but alias of id#2L. > The exchange operator is not required in this case if we are able to compare > the attribute id#2L referenced by alias newid#1L 0 > This issue happens in TPC-DS benchmark query - query#2 > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org