[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20667


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r171136135
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,17 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  /**
+   * Here we set max cache size as 1.Since the size of a 
BlockManagerId object
--- End diff --

nit:
```
The max cache size is hardcoded to 1, since the size of a 
BlockManagerId object is about 48B, the total memory cost should be below 1MB 
which is feasible.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r171121778
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

Thanks @jiangxb1987 i have updated the comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170884966
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

Please feel free to update any comment, and I would set the default value 
to 1 since I think cost of 500KB memory is feasible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170881808
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

Actually i think 500 executors can handle most applications.And for 
historyserver it is no need to cache too much `BlockManagerId`.If we set this 
number as 50 the max size of cache will below 30KB.
Agree with that? @jiangxb1987 If ok i will update documentation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170865897
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

I‘m +1 on hardcode this, but we shall also explain in the comment the 
reason why this number is chosen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170864215
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
--- End diff --

I think it is thread-safe which i refer from:

https://google.github.io/guava/releases/22.0/api/docs/com/google/common/cache/LoadingCache.html
and 
https://stackoverflow.com/questions/11124856/using-guava-for-high-performance-thread-safe-caching


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170844718
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

I'm ok to hardcode it for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170844624
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
--- End diff --

 is it thread-safe?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170844541
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
+.build(new CacheLoader[BlockManagerId, BlockManagerId]() {
+  override def load(id: BlockManagerId) = {
--- End diff --

nit: `override def load(id: BlockManagerId) = id`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-26 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170817107
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

here i set 500
since `blockmanagerId` about `48B` per object.
I do not use spark conf since it is not convenient to get spark conf for 
historyserver when use BlockManagerId


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org