[ https://issues.apache.org/jira/browse/SPARK-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Reiss resolved SPARK-4157. ---------------------------------- Resolution: Duplicate > Task input statistics incomplete when a task reads from multiple locations > -------------------------------------------------------------------------- > > Key: SPARK-4157 > URL: https://issues.apache.org/jira/browse/SPARK-4157 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.1.0 > Reporter: Charles Reiss > Priority: Minor > > SPARK-1683 introduced tracking of filesystem reads for tasks, but the > tracking code assumes that each task reads from exactly one file/cache block, > and replaces any prior InputMetrics object for a task after each read. > But, for example, a task computing a shuffle-less join (input RDDs are > prepartitioned by key) may read two or more cached dependency RDD blocks from > cache. In this case, the displayed input size will be for whichever > dependency was requested last. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org