[ https://issues.apache.org/jira/browse/SPARK-25174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-25174. ------------------------------------ Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22180 [https://github.com/apache/spark/pull/22180] > ApplicationMaster suspends when unregistering itself from RM with extreme > large diagnostic message > -------------------------------------------------------------------------------------------------- > > Key: SPARK-25174 > URL: https://issues.apache.org/jira/browse/SPARK-25174 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.1.1 > Reporter: Kent Yao > Assignee: Kent Yao > Priority: Major > Fix For: 2.4.0 > > > We recently ran into SPARK-18016 which has been fixed in v2.3.0. This JIRA is > not about the issue in SPARK-18016 but the side-effect which it brings. When > SPARK-18016 occurs, ApplicationMaster fails unregistering itself because the > exception contains extreme large error information. > {code:java} > ERROR yarn.ApplicationMaster: User class threw exception: > java.lang.RuntimeException: Error while decoding: > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown > past JVM limit of 0xFFFF > /* 001 */ public java.lang.Object generate(Object[] references) { > .... > /* 395656 */ mutableRow.update(0, value); > /* 395657 */ } > /* 395658 */ > /* 395659 */ return mutableRow; > /* 395660 */ } > /* 395661 */ } > {code} > The above codegen text is included in the final message for AM to wave > goodbye to RM, while it ends up crashing the rm'sĀ ZKRMStateStore forĀ > YARN-6125 not covering the unregisterApplicationMaster's message truncation. > We also create an Jira on YARN Side > https://issues.apache.org/jira/browse/YARN-8691 > Although SPARK-18016 fixed already, there are maybe other uncaught exceptions > will cause this problem. I guess that we should limit the error message's > size sent to RM while unregistering AM . -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org