[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20667 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r171136135 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,17 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + /** + * Here we set max cache size as 1.Since the size of a BlockManagerId object --- End diff -- nit: ``` The max cache size is hardcoded to 1, since the size of a BlockManagerId object is about 48B, the total memory cost should be below 1MB which is feasible. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r171121778 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) --- End diff -- Thanks @jiangxb1987 i have updated the comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170884966 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) --- End diff -- Please feel free to update any comment, and I would set the default value to 1 since I think cost of 500KB memory is feasible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170881808 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) --- End diff -- Actually i think 500 executors can handle most applications.And for historyserver it is no need to cache too much `BlockManagerId`.If we set this number as 50 the max size of cache will below 30KB. Agree with that? @jiangxb1987 If ok i will update documentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170865897 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) --- End diff -- Iâm +1 on hardcode this, but we shall also explain in the comment the reason why this number is chosen. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170864215 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() --- End diff -- I think it is thread-safe which i refer from: https://google.github.io/guava/releases/22.0/api/docs/com/google/common/cache/LoadingCache.html and https://stackoverflow.com/questions/11124856/using-guava-for-high-performance-thread-safe-caching --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170844718 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) --- End diff -- I'm ok to hardcode it for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170844624 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() --- End diff -- is it thread-safe? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170844541 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) +.build(new CacheLoader[BlockManagerId, BlockManagerId]() { + override def load(id: BlockManagerId) = { --- End diff -- nit: `override def load(id: BlockManagerId) = id` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170817107 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = CacheBuilder.newBuilder() +.maximumSize(500) --- End diff -- here i set 500 since `blockmanagerId` about `48B` per object. I do not use spark conf since it is not convenient to get spark conf for historyserver when use BlockManagerId --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org