HyukjinKwon opened a new pull request, #37270:
URL: https://github.com/apache/spark/pull/37270
### What changes were proposed in this pull request?
This PR proposes to avoid out-of-memory in TPC-DS build at GitHub Actions CI
by:
- Increasing the number of partitions being used in shuffle.
- Truncating precisions after 10th in floats.
The number of partitions was previously set to 1 because of different
results in precisions that generally we can just ignore.
- Sort the results regardless of join type since Apache Spark does not
guarantee the order of results
### Why are the changes needed?
One of the reasons for the large memory usage seems to be single partition
that's being used in the shuffle.
### Does this PR introduce _any_ user-facing change?
No, test-only.
### How was this patch tested?
GitHub Actions in this CI will test it out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]