[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/17329 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108881597 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Ping @witgo to update or close --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108067937 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- I'm simply describing what you proposed above at https://github.com/apache/spark/pull/17329#discussion_r107706851 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108049460 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Sorry,I didn't get your idea. Can you write some code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108032669 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- This still doesn't address the point. What you say is true even if you make the change I suggest, which is to remove the superfluous constructor. The performance is exactly the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108032378 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Suppose there are E Executor in the cluster, a shuffle process has M Map task, R reduce task, in the master branch will be created: 1. Up to M * R FileSegmentManagedBuffer instances 2. Up to 2 * M * R NoSuchElementException instances in this PR will be created: 1. Up to M * R FileSegmentManagedBuffer instances 2. Up to 2 * NoSuchElementException instances (ExternalShuffleBlockResolver and IndexShuffleBlockResolver are created once executor starts and They call the new constructor to create a FileSegmentManagedBuffer instance) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108031525 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- I'm not sure what you're stating, that this is slow? but you also only made half the changes as you did here. I don't see what the argument is. Removing the constructor you added can't have an effect on performance either way. In either case, you access the value once in the constructor. I think all you're showing is that the first file changed here doesn't matter. You can revert this entirely too. It's ExternalBlockShufflerManager that matters. Make one or the other change and this can proceed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r108027203 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- This branch [SPARK-19991_try2 ](https://github.com/witgo/spark/commits/SPARK-19991_try2) needs `244.45` s in my test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107846996 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- But if that's true, then the proposed changed to this class here already doesn't help anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107841105 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Yes, But the above code does not optimize performance, `FileSegmentManagedBuffer.convertToNetty` method is also called only once in the master branch code . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107707746 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Yes exactly. The point is you just check them once, right? this also accomplishes that goal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107706851 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Like the following code? ```java public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { this.lazyFileDescriptor = conf.lazyFileDescriptor(); this.memoryMapBytes = conf.memoryMapBytes(); this.file = file; this.offset = offset; this.length = length; } ``` the code `conf.lazyFileDescriptor();` or `conf.memoryMapBytes();` creates a NoSuchElementException instance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107611158 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- No, why? I mean keep exactly the same constructors, no more no less. No code would change. You just set your two new fields in the current constructor. It actually means you don't need some of the changes you made here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107572297 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- That will change a lot of code, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r106848680 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Just don't add a new constructor. The existing one can set the new fields. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r106781598 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Oh, do you have a better idea? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r106778634 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- Yeah that makes sense then but I don't think you need a new public constructor for this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/17329 [SPARK-19991]FileSegmentManagedBuffer performance improvement FileSegmentManagedBuffer performance improvement. ## What changes were proposed in this pull request? When we do not set the value of the configuration items `spark.storage.memoryMapThreshold` and `spark.shuffle.io.lazyFD`, each call to the cFileSegmentManagedBuffer.nioByteBuffer or FileSegmentManagedBuffer.createInputStream method creates a NoSuchElementException instance. This is a more time-consuming operation. In the use case, this PR can improve the performance of about 3.5% The test code: ``` scala (1 to 10).foreach { i => val numPartition = 1 val rdd = sc.parallelize(0 until numPartition).repartition(numPartition).flatMap { t => (0 until numPartition).map(r => r * numPartition + t) }.repartition(numPartition) val serializeStart = System.currentTimeMillis() rdd.sum() val serializeFinish = System.currentTimeMillis() println(f"Test $i: ${(serializeFinish - serializeStart) / 1000D}%1.2f") } ``` and `spark-defaults.conf` file: ``` spark.master yarn-client spark.executor.instances 20 spark.driver.memory 64g spark.executor.memory 30g spark.executor.cores 5 spark.default.parallelism 100 spark.sql.shuffle.partitions 100 spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.maxResultSize0 spark.ui.enabled false spark.driver.extraJavaOptions -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=512M spark.executor.extraJavaOptions -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M spark.cleaner.referenceTracking.blocking true spark.cleaner.referenceTracking.blocking.shuffle true ``` The test results are as follows | [SPARK-19991](https://github.com/witgo/spark/tree/SPARK-19991) |https://github.com/apache/spark/commit/68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf| |---| --- | |226.09 s| 235.21 s| ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-19991 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17329.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17329 commit abcfc79991ecd1d5cef2cd1e275b872695ba19d9 Author: Guoqiang Li Date: 2017-03-17T03:19:37Z FileSegmentManagedBuffer performance improvement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org