Github user viirya commented on the issue:
https://github.com/apache/spark/pull/21291
Changed `outputPartitioning` changes executed plans.
E.g. in `WholeStageCodegenSuite`, a query like
`spark.range(3).groupBy("id").count().orderBy("id")`. Its executed plan changes
from
```
*(3) Sort [id#22L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(id#22L ASC NULLS FIRST, 5)
+- *(2) HashAggregate(keys=[id#22L], functions=[count(1)],
output=[id#22L, count#26L])
+- Exchange hashpartitioning(id#22L, 5)
+- *(1) HashAggregate(keys=[id#22L], functions=[partial_count(1)],
output=[id#22L, count#31L])
+- *(1) Range (0, 3, step=1, splits=2)
```
to
```
*(1) Sort [id#22L ASC NULLS FIRST], true, 0
+- *(1) HashAggregate(keys=[id#22L], functions=[count(1)], output=[id#22L,
count#26L])
+- *(1) HashAggregate(keys=[id#22L], functions=[partial_count(1)],
output=[id#22L, count#31L])
+- *(1) Range (0, 3, step=1, splits=2)
```
I will update related tests.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]