Misha Dmitriev created SPARK-24827: -------------------------------------- Summary: Some memory waste in History Server by strings in AccumulableInfo objects Key: SPARK-24827 URL: https://issues.apache.org/jira/browse/SPARK-24827 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.2.2 Reporter: Misha Dmitriev
I've analyzed a heap dump of Spark History Server with jxray ([www.jxray.com)|http://www.jxray.com)/] and found that 42% of the heap is wasted due to duplicate strings. The biggest sources of such strings are the {{name}} and {{value}} data fields of {{AccumulableInfo}} objects: {code:java} 7. Duplicate Strings: overhead 42.1% Total strings Unique strings Duplicate values Overhead 13,732,278 729,234 354,032 867,177K (42.1%) Expensive data fields: 318,421K (15.4%), 3669685 / 100% dup strings (8 unique), 3669685 dup backing arrays: ↖org.apache.spark.scheduler.AccumulableInfo.name 178,994K (8.7%), 3674403 / 99% dup strings (35640 unique), 3674403 dup backing arrays: ↖scala.Some.x 168,601K (8.2%), 3401960 / 92% dup strings (175826 unique), 3401960 dup backing arrays: ↖org.apache.spark.scheduler.AccumulableInfo.value{code} That is, 15.4% of the heap is wasted by {{AccumulableInfo.name}} and 8.2% is wasted by {{AccumulableInfo.value}}. It turns out that the problem has been partially addressed in spark 2.3+, e.g. [https://github.com/apache/spark/blob/b045315e5d87b7ea3588436053aaa4d5a7bd103f/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L590] However, this code has two minor problems: # Strings for {{AccumulableInfo.value}} are not interned in the above code, only {{AccumulableInfo.name}}. # For interning, the code in {{weakIntern(String)}} method uses a Guava interner ({{stringInterner = Interners.newWeakInterner[String]()}}). This is an old-fashioned, less efficient way of interning strings. Since some 3-4 years old JDK7 version, the built-in JVM {{String.intern()}} method is much more efficient, both in terms of CPU and memory. It is therefore suggested to add interning for {{value}} and replace the Guava interner with {{String.intern()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org