Andrew Or created SPARK-1538:
--------------------------------

             Summary: SparkUI forgets about all persisted RDD's not associated 
with stages
                 Key: SPARK-1538
                 URL: https://issues.apache.org/jira/browse/SPARK-1538
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 0.9.1
            Reporter: Andrew Or
            Priority: Blocker
             Fix For: 1.0.0


The following command creates two RDDs in one Stage:

sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count

More specifically, parallelize creates one, and map creates another. If we 
persist only the first one, it does not actually show up on the StorageTab of 
the SparkUI.

This is because StageInfo only keeps around information for the last RDD 
associated with the stage, but forgets about all of its parents. The proposal 
here is to have StageInfo climb the RDD dependency ladder to keep a list of all 
associated RDDInfos.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to