Reynold Xin created SPARK-3613: ---------------------------------- Summary: Don't record the size of each shuffle block for large jobs Key: SPARK-3613 URL: https://issues.apache.org/jira/browse/SPARK-3613 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Reporter: Reynold Xin Assignee: Reynold Xin
MapStatus saves the size of each block (1 byte per block) for a particular map task. This actually means the shuffle metadata is O(M*R), where M = num maps and R = num reduces. If M is greater than a certain size, we should probably just send an average size instead of a whole array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org