[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-30 Thread witgo
Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/17329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108881597
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Ping @witgo to update or close


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108067937
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

I'm simply describing what you proposed above at 
https://github.com/apache/spark/pull/17329#discussion_r107706851


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-25 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108049460
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Sorry,I didn't get your idea. Can you write some code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-25 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108032669
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

This still doesn't address the point. What you say is true even if you make 
the change I suggest, which is to remove the superfluous constructor. The 
performance is exactly the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-25 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108032378
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Suppose there are E Executor in the cluster, a shuffle process has M Map 
task, R reduce task, in the master branch will be created:

1. Up to M * R FileSegmentManagedBuffer instances
2. Up to 2 * M * R NoSuchElementException instances

in this PR will be created:

1. Up to M * R FileSegmentManagedBuffer instances
2. Up to 2 * NoSuchElementException instances (ExternalShuffleBlockResolver 
and IndexShuffleBlockResolver are created once executor starts and They call 
the new constructor to create a FileSegmentManagedBuffer instance)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-25 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108031525
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

I'm not sure what you're stating, that this is slow? but you also only made 
half the changes as you did here. I don't see what the argument is. Removing 
the constructor you added can't have an effect on performance either way. In 
either case, you access the value once in the constructor. 

I think all you're showing is that the first file changed here doesn't 
matter. You can revert this entirely too. It's ExternalBlockShufflerManager 
that matters.

Make one or the other change and this can proceed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-24 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r108027203
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

This branch [SPARK-19991_try2 
](https://github.com/witgo/spark/commits/SPARK-19991_try2) needs `244.45` s in 
my test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107846996
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

But if that's true, then the proposed changed to this class here already 
doesn't help anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-23 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107841105
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Yes,  But the above code does not optimize performance, 
`FileSegmentManagedBuffer.convertToNetty` method  is also called only once  in 
the master branch code . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107707746
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Yes exactly. The point is you just check them once, right? this also 
accomplishes that goal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-23 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107706851
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Like the following code?

```java
  public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
this.lazyFileDescriptor = conf.lazyFileDescriptor();
this.memoryMapBytes = conf.memoryMapBytes();
this.file = file;
this.offset = offset;
this.length = length;
  }
```

the code `conf.lazyFileDescriptor();` or `conf.memoryMapBytes();`  creates 
a NoSuchElementException instance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107611158
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

No, why? I mean keep exactly the same constructors, no more no less. No 
code would change. You just set your two new fields in the current constructor. 
It actually means you don't need some of the changes you made here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-22 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107572297
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

That will change a lot of code, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-20 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r106848680
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Just don't add a new constructor. The existing one can set the new fields. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-18 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r106781598
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Oh, do you have a better idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-18 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r106778634
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

Yeah that makes sense then but I don't think you need a new public 
constructor for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-16 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/17329

[SPARK-19991]FileSegmentManagedBuffer performance improvement

FileSegmentManagedBuffer performance improvement.


## What changes were proposed in this pull request?

When we do not set the value of the configuration items 
`spark.storage.memoryMapThreshold` and `spark.shuffle.io.lazyFD`, 
each call to the cFileSegmentManagedBuffer.nioByteBuffer or 
FileSegmentManagedBuffer.createInputStream method creates a 
NoSuchElementException instance. This is a more time-consuming operation.

In the use case, this PR can improve the performance of about 3.5%

The test code:

``` scala

(1 to 10).foreach { i =>
  val numPartition = 1
  val rdd = sc.parallelize(0 until 
numPartition).repartition(numPartition).flatMap { t =>
(0 until numPartition).map(r => r * numPartition + t)
  }.repartition(numPartition)
  val serializeStart = System.currentTimeMillis()
  rdd.sum()
  val serializeFinish = System.currentTimeMillis()
  println(f"Test $i: ${(serializeFinish - serializeStart) / 1000D}%1.2f")
}


```

and `spark-defaults.conf` file:

```
spark.master  yarn-client
spark.executor.instances  20
spark.driver.memory   64g
spark.executor.memory 30g
spark.executor.cores  5
spark.default.parallelism 100 
spark.sql.shuffle.partitions  100
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.driver.maxResultSize0
spark.ui.enabled  false 
spark.driver.extraJavaOptions -XX:+UseG1GC 
-XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=512M 
spark.executor.extraJavaOptions   -XX:+UseG1GC 
-XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M 
spark.cleaner.referenceTracking.blocking  true
spark.cleaner.referenceTracking.blocking.shuffle  true

```

The test results are as follows

| [SPARK-19991](https://github.com/witgo/spark/tree/SPARK-19991) 
|https://github.com/apache/spark/commit/68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf|
|---| --- | 
|226.09 s| 235.21 s|

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-19991

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17329


commit abcfc79991ecd1d5cef2cd1e275b872695ba19d9
Author: Guoqiang Li 
Date:   2017-03-17T03:19:37Z

FileSegmentManagedBuffer performance improvement




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org