[ https://issues.apache.org/jira/browse/SPARK-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442259#comment-16442259 ]
Apache Spark commented on SPARK-24011: -------------------------------------- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/21096 > Cache rdd's immediate parent ShuffleDependencies to accelerate > getShuffleDependencies() > --------------------------------------------------------------------------------------- > > Key: SPARK-24011 > URL: https://issues.apache.org/jira/browse/SPARK-24011 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: wuyi > Priority: Minor > > When creating stages for jobs, we need to find a rdd's (except the final rdd) > immediate parent ShuffleDependencies by method getShuffleDependencies() for > at least 2 times (first in > getMissingAncestorShuffleDependencies(), and second in > getOrCreateParentStages()). > So, we can cache the result at the fist time we call getShuffleDependencies(). > This is helpful for cutting time consuming when there's many > NarrowDependencies between the rdd and its immediate parent > ShuffleDependencies or if the rdd has a number of immediate parent > ShuffleDependencies . > > There's an exception for checkpointed rdd. If a rdd is checkpointed, it's > immediate parent ShuffleDependencies should adjust to empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org