Sergei Lebedev created SPARK-22805: -------------------------------------- Summary: Use aliases for StorageLevel in event logs Key: SPARK-22805 URL: https://issues.apache.org/jira/browse/SPARK-22805 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.2.1, 2.1.2 Reporter: Sergei Lebedev Priority: Minor
Fact 1: {{StorageLevel}} has a private constructor, therefore a list of predefined levels is not extendable (by the users). Fact 2: The format of event logs uses redundant representation for storage levels {code} >>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true, >>> "Replication": 1}') 79 >>> len('DISK_ONLY') 9 {code} Fact 3: This leads to excessive log sizes for workloads with lots of partitions, because every partition would have the storage level field which is 60-70 bytes more than it should be. Suggested quick win: use the names of the predefined levels to identify them in the event log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org