Github user Ngone51 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170424196
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
    +  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, 
BlockManagerId](true)
     
    -  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
    +  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean 
= false): BlockManagerId =
    +  {
         blockManagerIdCache.putIfAbsent(id, id)
    -    blockManagerIdCache.get(id)
    +    val blockManagerId = blockManagerIdCache.get(id)
    +    if (clearOldValues) {
    +      blockManagerIdCache.clearOldValues(System.currentTimeMillis - 
Utils.timeStringAsMs("10d"))
    --- End diff --
    
    10 days? I don't think *time* can be a judging criteria to decide whether 
we should remove a cached id or not, even if you set the time threshold far 
less/greater than '10d'. Think about a extreamly case that a block could be 
frequently got all the time during the app‘s running. So, it would be 
certainly removed from cache due to the time threshold, and recached next time 
we get it, and repeatedly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to