Github user liyezhang556520 commented on a diff in the pull request:
https://github.com/apache/spark/pull/12083#discussion_r58287737
--- Diff:
common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java
---
@@ -44,6 +45,14 @@
private long totalBytesTransferred;
/**
+ * When the write buffer size is larger than this limit, I/O will be
done in chunks of this size.
+ * The size should not be too large as it will waste underlying memory
copy. e.g. If network
+ * avaliable buffer is smaller than this limit, the data cannot be sent
within one single write
+ * operation while it still will make memory copy with this size.
+ */
+ private static final int NIO_BUFFER_LIMIT = 512 * 1024;
--- End diff --
>What if we create DirectByteBuffer here manually for a big buf (big enough
so that we can get benefits even if creating a direct buffer is slow) and try
to write as many as possible? Then we can avoid the memory copy in IOUtil.write.
@zsxwing , Yes, redundant copy can be avoided if we give a directBuffer
directly to `WritableByteChannel.write()` because of code in line
http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#50,
but I don't know if that's worthwhile. `IOUtil` will maintain a directBuffer
pool to avoid frequently allocate the directBuffers. I think that's why when I
made the test, the first time I run code
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Long](1024 *
1024 * 200)).iterator).reduce((a,b)=> a).length`, the network throughput is
extremely low on executor side, and if I ran this code after I ran the code
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 *
1024 * 50)).iterator).reduce((a,b)=> a).length`, the network throughput will be
much higher.
So, If we want create direct Buffer manually in Spark, It's better also
maintain a buffer pool, but that will introduce much more complexity and have
the risk of memory leak.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]