Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/20667#discussion_r170424196 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala --- @@ -132,10 +133,15 @@ private[spark] object BlockManagerId { getCachedBlockManagerId(obj) } - val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]() + val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, BlockManagerId](true) - def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = { + def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean = false): BlockManagerId = + { blockManagerIdCache.putIfAbsent(id, id) - blockManagerIdCache.get(id) + val blockManagerId = blockManagerIdCache.get(id) + if (clearOldValues) { + blockManagerIdCache.clearOldValues(System.currentTimeMillis - Utils.timeStringAsMs("10d")) --- End diff -- 10 days? I don't think *time* can be a judging criteria to decide whether we should remove a cached id or not, even if you set the time threshold far less/greater than '10d'. Think about a extreamly case that a block could be frequently got all the time during the appâs running. So, it would be certainly removed from cache due to the time threshold, and recached next time we get it, and repeatedly.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org