liuzqt commented on code in PR #38064:
URL: https://github.com/apache/spark/pull/38064#discussion_r995232173
##########
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala:
##########
@@ -172,6 +247,8 @@ private[spark] class ChunkedByteBuffer(var chunks:
Array[ByteBuffer]) {
private[spark] object ChunkedByteBuffer {
+ val COPY_BUFFER_LEN: Int = 1024 * 1024
Review Comment:
I added a `def estimateBufferChunkSize(estimatedSize: Long = -1)` to be used
for both. But I'm not sure if the heuristic is appropriate.....
Or another option: we can use `1024`(1KB) for all, make it simple. I did
some quick benchmark, 1KB isn't too bad compared to 1MB even in large result,
and the overhead upper bound is reasonable even when result is very
tiny(actually even a nearly empty result will still be serialized to a few
hundred bytes because of other metrics and accumulators)
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]