[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718004#comment-16718004
 ] 

ASF GitHub Bot commented on SPARK-24920:


ankuriitg commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240792081
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
+  boolean allowDirectBufs,
+  boolean allowCache) {
+final int index = allowCache ? 0 : 1;
 
 Review comment:
   I don't think that you need to change createPooledByteBufAllocator if you 
change this method. allowCache is used to determine whether to create a pool 
with or without cache in createPooledByteBufAllocator. This is different from 
getSharedPooledByteBufAllocator which is using allowCache to determine this 
plus, it is also using allowCache to determine whether to return a 
pooledAllocator for a client or a server.
   
   You may still want to keep this change if allowCache is the only thing that 
distinguishes the two pooled allocators but if it is not then it may be better 
to create another variable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717967#comment-16717967
 ] 

ASF GitHub Bot commented on SPARK-24920:


attilapiros commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240782894
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
 
 Review comment:
   I see, thanks. 
   
   With volatile it would be fine:
   
http://www.java67.com/2016/04/why-double-checked-locking-was-broken-before-java5.html
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717912#comment-16717912
 ] 

ASF GitHub Bot commented on SPARK-24920:


vanzin commented on a change in pull request #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240774275
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
 
 Review comment:
   What that article talks about was fixed in Java 1.5. Double-checked locking 
works fine since then.
   
   But I think that's overkill here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717921#comment-16717921
 ] 

ASF GitHub Bot commented on SPARK-24920:


attilapiros commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240775425
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
+  boolean allowDirectBufs,
+  boolean allowCache) {
+final int index = allowCache ? 0 : 1;
 
 Review comment:
   Thanks for reviewing it!
   
   Introducing a new parameter like `isClient` would just hide the that the 
value of the  allowCache is differs but at the callers this level of 
abstraction is already introduced. I mean with calling 
`createPooledByteBufAllocator`, like: 
   
[TransportServer.java#L77](https://github.com/apache/spark/blob/ae8fa8e81510c69f8c368cad25788f5423b4/common/network-common/src/main/java/org/apache/spark/network/server/TransportServer.java#L77
 )
   
   So I would keep this (or change createPooledByteBufAllocator too).  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717897#comment-16717897
 ] 

ASF GitHub Bot commented on SPARK-24920:


attilapiros commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240771800
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
 
 Review comment:
   I have read once this article (so I try to avoid using it):
   
https://www.javaworld.com/article/2074979/java-concurrency/double-checked-locking--clever--but-broken.html
  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717849#comment-16717849
 ] 

ASF GitHub Bot commented on SPARK-24920:


ankuriitg commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240755466
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
+  boolean allowDirectBufs,
+  boolean allowCache) {
+final int index = allowCache ? 0 : 1;
 
 Review comment:
   I am not sure if it is a good idea to piggyback on allowCache to determine 
whether is a client/server pooled allocator. Maybe use another variable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717850#comment-16717850
 ] 

ASF GitHub Bot commented on SPARK-24920:


ankuriitg commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240757669
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the default number of threads for both the Netty client and 
server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of 
available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+final int availableCores;
+if (numUsableCores > 0) {
+  availableCores = numUsableCores;
+} else {
+  availableCores = Runtime.getRuntime().availableProcessors();
+}
+return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
 
 Review comment:
   Maybe use double-checked locking instead of method synchronization, since 
the instantiation just needs to happen once but this may unnecessarily block 
all later calls.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717719#comment-16717719
 ] 

ASF GitHub Bot commented on SPARK-24920:


vanzin commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446309870
 
 
   LGTM, will leave it here a bit to see if anyone else comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716104#comment-16716104
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446062090
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99938/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716103#comment-16716103
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446062086
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716099#comment-16716099
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446062086
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716100#comment-16716100
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446062090
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99938/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716095#comment-16716095
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446010375
 
 
   **[Test build #99938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99938/testReport)**
 for PR 23278 at commit 
[`ae8fa8e`](https://github.com/apache/spark/commit/ae8fa8e81510c69f8c368cad25788f5423b4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716094#comment-16716094
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446061759
 
 
   **[Test build #99938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99938/testReport)**
 for PR 23278 at commit 
[`ae8fa8e`](https://github.com/apache/spark/commit/ae8fa8e81510c69f8c368cad25788f5423b4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715756#comment-16715756
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446011512
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715759#comment-16715759
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446011518
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5943/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715758#comment-16715758
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446011512
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715757#comment-16715757
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446011518
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5943/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715754#comment-16715754
 ] 

ASF GitHub Bot commented on SPARK-24920:


attilapiros commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240418730
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
 ##
 @@ -265,6 +265,16 @@ public boolean saslServerAlwaysEncrypt() {
 return conf.getBoolean("spark.network.sasl.serverAlwaysEncrypt", false);
   }
 
+  /**
+   * Flag indicating whether to share the pooled ByteBuf allocators between 
the different Netty
+   * channels. If enabled then only two pooled ByteBuf allocators are created: 
one where caching
+   * is allowed (for transport servers) and one where not (for transport 
clients).
+   * When disabled a new allocator is created for each transport servers and 
clients.
+   */
+  public Boolean sharedByteBufAllocators() {
+return conf.getBoolean("spark.network.sharedByteBufAllocators", true);
 
 Review comment:
   Thanks! I've renamed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715752#comment-16715752
 ] 

ASF GitHub Bot commented on SPARK-24920:


attilapiros commented on a change in pull request #23278: [SPARK-24920][Core] 
Allow sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240418509
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +99,21 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
+  boolean allowDirectBufs,
+  boolean allowCache) {
+final int index = allowCache ? 0 : 1;
+if (_sharedPooledByteBufAllocator[index] == null) {
+  _sharedPooledByteBufAllocator[index] =
+createPooledByteBufAllocator(allowDirectBufs, allowCache, 0 /* 
numCores */);
 
 Review comment:
   I have moved  `MAX_DEFAULT_NETTY_THREADS` and `defaultNumThreads` from core 
to network-common (because of the dependency direction).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715749#comment-16715749
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-446010375
 
 
   **[Test build #99938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99938/testReport)**
 for PR 23278 at commit 
[`ae8fa8e`](https://github.com/apache/spark/commit/ae8fa8e81510c69f8c368cad25788f5423b4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715685#comment-16715685
 ] 

ASF GitHub Bot commented on SPARK-24920:


vanzin commented on a change in pull request #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240405680
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
 ##
 @@ -265,6 +265,16 @@ public boolean saslServerAlwaysEncrypt() {
 return conf.getBoolean("spark.network.sasl.serverAlwaysEncrypt", false);
   }
 
+  /**
+   * Flag indicating whether to share the pooled ByteBuf allocators between 
the different Netty
+   * channels. If enabled then only two pooled ByteBuf allocators are created: 
one where caching
+   * is allowed (for transport servers) and one where not (for transport 
clients).
+   * When disabled a new allocator is created for each transport servers and 
clients.
+   */
+  public Boolean sharedByteBufAllocators() {
+return conf.getBoolean("spark.network.sharedByteBufAllocators", true);
 
 Review comment:
   More standard naming would be "spark.network.sharedAllocators.enabled".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715684#comment-16715684
 ] 

ASF GitHub Bot commented on SPARK-24920:


vanzin commented on a change in pull request #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#discussion_r240405360
 
 

 ##
 File path: 
common/network-common/src/main/java/org/apache/spark/network/util/NettyUtils.java
 ##
 @@ -95,6 +99,21 @@ public static String getRemoteAddress(Channel channel) {
 return "";
   }
 
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the 
specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator 
getSharedPooledByteBufAllocator(
+  boolean allowDirectBufs,
+  boolean allowCache) {
+final int index = allowCache ? 0 : 1;
+if (_sharedPooledByteBufAllocator[index] == null) {
+  _sharedPooledByteBufAllocator[index] =
+createPooledByteBufAllocator(allowDirectBufs, allowCache, 0 /* 
numCores */);
 
 Review comment:
   Hmm.. it may be good to think about having a better way to define the number 
of cores here. The issue is that by using the default you may be wasting 
resources.
   
   e.g. if your container is only requesting 1 CPU but the host actually has 32 
CPUs, this will create 64 allocation arenas.
   
   (For example, `SparkTransportConf.fromSparkConf` tries to limit thread pool 
sizes and thus the size of the allocators by using the configured number of 
CPUs.)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715648#comment-16715648
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445991596
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715642#comment-16715642
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445903508
 
 
   **[Test build #99927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99927/testReport)**
 for PR 23278 at commit 
[`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715639#comment-16715639
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445990721
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715649#comment-16715649
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445991602
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99927/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715647#comment-16715647
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445991602
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99927/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715646#comment-16715646
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445991596
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715641#comment-16715641
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445991008
 
 
   **[Test build #99927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99927/testReport)**
 for PR 23278 at commit 
[`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715640#comment-16715640
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445990724
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99928/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715638#comment-16715638
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445990724
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99928/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715637#comment-16715637
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445990721
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715636#comment-16715636
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445905307
 
 
   **[Test build #99928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99928/testReport)**
 for PR 23278 at commit 
[`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715634#comment-16715634
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445990056
 
 
   **[Test build #99928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99928/testReport)**
 for PR 23278 at commit 
[`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715200#comment-16715200
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445902818
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715211#comment-16715211
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445905307
 
 
   **[Test build #99928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99928/testReport)**
 for PR 23278 at commit 
[`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715210#comment-16715210
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445905081
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5934/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715209#comment-16715209
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445905073
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715207#comment-16715207
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445905073
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715208#comment-16715208
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445905081
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5934/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715204#comment-16715204
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445902818
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715202#comment-16715202
 ] 

ASF GitHub Bot commented on SPARK-24920:


SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445903508
 
 
   **[Test build #99927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99927/testReport)**
 for PR 23278 at commit 
[`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715203#comment-16715203
 ] 

ASF GitHub Bot commented on SPARK-24920:


vanzin commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's 
memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445903549
 
 
   add to whitelist


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715201#comment-16715201
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow 
sharing Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445902681
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715198#comment-16715198
 ] 

ASF GitHub Bot commented on SPARK-24920:


AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278#issuecomment-445902681
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715190#comment-16715190
 ] 

ASF GitHub Bot commented on SPARK-24920:


attilapiros opened a new pull request #23278: [SPARK-24920][Core] Allow sharing 
Netty's memory pool allocators
URL: https://github.com/apache/spark/pull/23278
 
 
   ## What changes were proposed in this pull request?
   
   Introducing shared polled ByteBuf allocators. 
   This feature can be enabled via the "spark.network.sharedByteBufAllocators" 
configuration. 
   
   When it is on then only two pooled ByteBuf allocators are created: 
   - one for transport servers where caching is allowed and
   - one for transport clients where caching is disabled
   
   This way the cache allowance remains as before. 
   Both shareable pools are created with numCores parameter set to 0 (which 
defaults to the available processors) as conf.serverThreads() and 
conf.clientThreads() are module dependant and the lazy creation of this 
allocators would lead to unpredicted behaviour.
   
   When "spark.network.sharedByteBufAllocators" is false then a new allocator 
is created for every transport client and server separately as was before this 
PR.
   
   ## How was this patch tested?
   
   Existing unit tests.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

2018-12-10 Thread Attila Zsolt Piros (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715040#comment-16715040
 ] 

Attila Zsolt Piros commented on SPARK-24920:


I started to work on it.

I would keep a separate memory pool for transport clients and transport 
servers. 
This way cache allowance for each pool would be like before the change for 
transport servers it would be on and for transport clients it would be off. 

 

> Spark should allow sharing netty's memory pools across all uses
> ---
>
> Key: SPARK-24920
> URL: https://issues.apache.org/jira/browse/SPARK-24920
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following 
> "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, 
> different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  
> In my experiments I've found each pool will grow due to a burst of activity 
> in the related service (eg. task start / end msgs), followed another burst in 
> a different service (eg. sending torrent broadcast blocks).  Because of the 
> way these pools work, they allocate memory in large chunks (16 MB by default) 
> for each netty thread, so there is often a surge of 128 MB of allocated 
> memory, even for really tiny messages.  Also a lot of this memory is offheap 
> by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  
> In some experiments I tried, this noticeably decreased memory usage, both 
> onheap and offheap (no significant performance effect in my small 
> experiments).
> As this is a pretty core change, as I first step I'd propose just exposing 
> this as a conf, to let user experiment more broadly across a wider range of 
> workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org