Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21474#discussion_r192487033
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -429,7 +429,11 @@ package object config {
"external shuffle service, this feature can only be worked when
external shuffle" +
"service is newer than Spark 2.2.")
.bytesConf(ByteUnit.BYTE)
- .createWithDefault(Long.MaxValue)
+ // fetch-to-mem is guaranteed to fail if the message is bigger than
2 GB, so we might
+ // as well use fetch-to-disk in that case. The message includes
some metadata in addition
+ // to the block data itself (in particular UploadBlock has a lot of
metadata), so we leave
+ // extra room.
+ .createWithDefault(Int.MaxValue - 500)
--- End diff --
no guarantee its big enough. Seemed OK in the test I tried. But
UploadBlock has some variable length strings so can't say for sure.
I'm fine making this much bigger, eg. 1 MB -- you'd only be bigger than
that with a pathological case. then there would be *some* cases where we'd be
taking an old message which was fine with fetch-to-mem and we'd switch to
fetch-to-disk. But such a tiny case, and not an unreasonable change even for
that ... so should be OK.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]