Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16761636
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
               // Otherwise, cache the values and keep track of any updates in 
block statuses
               val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
               val cachedValues = putInBlockManager(key, computedValues, 
storageLevel, updatedBlocks)
    -          context.taskMetrics.updatedBlocks = Some(updatedBlocks)
    +          val metrics = context.taskMetrics
    +          val lastUpdatedBlocks = 
metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
    +          metrics.updatedBlocks = Some(lastUpdatedBlocks ++ 
updatedBlocks.toSeq)
    --- End diff --
    
    @andrewor14 IMHO, the "getOrCompute" can be called more than once per task 
(indirect recursively). In this code snippet: 
    
           val rdd1 = sc.parallelize(...).cache()
           val rdd2 = rdd1.map(...).cache()
           val count = rdd2.count()
    
    This code snippet will submit one stage . We take task-1 as an example. 
Task-1 firstly calls getOrCompute(rdd-2) , and then calls getOrCompute(rdd-1) 
inside "getOrCompute(rdd-2)". Therefore, it will generates and caches block 
rdd-1-1 and  block rdd-2-1 one by one. At the end of getOrCompute(rdd-1), the 
taskMetrics.updatedBlocks of task-1 will be seq(rdd-1-1). Then at the end of 
getOrCompute(rdd-2), the taskMetrics.updatedBlocks will be seq(rdd-1-1, 
rdd-2-1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to