Github user mu5358271 commented on the issue:
https://github.com/apache/spark/pull/22961
Did some performance evaluation on a 1G test dataset on a small cluster
with the following script:
```
import java.util.UUID
import org.apache.spark.SparkContext
import
Github user mu5358271 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22961#discussion_r232888324
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -214,13 +214,22 @@ object ShuffleExchangeExec
Github user mu5358271 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22961#discussion_r232070793
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -214,13 +214,24 @@ object ShuffleExchangeExec
Github user mu5358271 commented on the issue:
https://github.com/apache/spark/pull/22961
cc @cloud-fan @gatorsmile @hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user mu5358271 opened a pull request:
https://github.com/apache/spark/pull/22961
[SPARK-25947][SQL] Reduce memory usage in ShuffleExchangeExec by selecting
only the sort columns
## What changes were proposed in this pull request?
When sorting rows