[GitHub] [spark] kazuyukitanimura opened a new pull request #33721: [SPARK-32210][CORE] Fix NegativeArraySizeException in MapOutputTracker with large spark.default.parallelism

GitBox Thu, 12 Aug 2021 03:25:22 -0700


kazuyukitanimura opened a new pull request #33721:
URL: https://github.com/apache/spark/pull/33721



   ### What changes were proposed in this pull request?
   The current `MapOutputTracker` class may throw `NegativeArraySizeException` 
with a large number of partitions. Within the serializeOutputStatuses() method, 
it is trying to compress an array of mapStatuses and outputting the binary data 
into (Apache)ByteArrayOutputStream . Inside the 
(Apache)ByteArrayOutputStream.toByteArray(), negative index exception happens 
because the index is int and overflows (2GB limit) when the output binary size 
is too large.
   
   This PR proposes two high-level ideas:
     1. Use `org.apache.spark.util.io.ChunkedByteBufferOutputStream`, which has 
a way to output the underlying buffer as `Array[Array[Byte]]`.
     2. Change the signatures from `Array[Byte]` to `Array[Array[Byte]]` in 
order to handle over 2GB compressed data.
   
   
   ### Why are the changes needed?
   This issue seems to be missed out in the earlier effort of addressing 2GB 
limitations [SPARK-6235](https://issues.apache.org/jira/browse/SPARK-6235)
   
   Without this fix, `spark.default.parallelism` needs to be kept at the low 
number. The drawback of setting smaller spark.default.parallelism is that it 
requires more executor memory (more data per partition). Setting 
`spark.io.compression.zstd.level` to higher number (default 1) hardly helps.
   
   That essentially means we have the data size limit that for shuffling and 
does not scale.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No. Optionally users can tune the chunk size through the 
`spark.shuffle.mapStatus.output.chunkSize` config.
   
   
   ### How was this patch tested?
   Passed existing tests
   ```
   build/sbt "core/testOnly org.apache.spark.MapOutputTrackerSuite"
   ```
   Also added a new unit test
   ```
   build/sbt "core/testOnly org.apache.spark.MapOutputTrackerSuite  -- -z 
SPARK-32210"
   ```
   Ran the benchmark using GitHub Actions and didn't not observe any 
performance penalties. The results are attached in this PR
   ```
   core/benchmarks/MapStatusesSerDeserBenchmark-jdk11-results.txt
   core/benchmarks/MapStatusesSerDeserBenchmark-results.txt
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] kazuyukitanimura opened a new pull request #33721: [SPARK-32210][CORE] Fix NegativeArraySizeException in MapOutputTracker with large spark.default.parallelism

Reply via email to