[ 
https://issues.apache.org/jira/browse/SPARK-25174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-25174.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

Issue resolved by pull request 22180
[https://github.com/apache/spark/pull/22180]

> ApplicationMaster suspends when unregistering itself from RM with extreme 
> large diagnostic message
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25174
>                 URL: https://issues.apache.org/jira/browse/SPARK-25174
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.1.1
>            Reporter: Kent Yao
>            Assignee: Kent Yao
>            Priority: Major
>             Fix For: 2.4.0
>
>
> We recently ran into SPARK-18016 which has been fixed in v2.3.0. This JIRA is 
> not about the issue in SPARK-18016 but the side-effect which it brings. When 
> SPARK-18016 occurs, ApplicationMaster fails unregistering itself because the 
> exception contains extreme large error information.
> {code:java}
> ERROR yarn.ApplicationMaster: User class threw exception: 
> java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown 
> past JVM limit of 0xFFFF
> /* 001 */ public java.lang.Object generate(Object[] references) {
> ....
> /* 395656 */       mutableRow.update(0, value);
> /* 395657 */     }
> /* 395658 */
> /* 395659 */     return mutableRow;
> /* 395660 */   }
> /* 395661 */ }
> {code}
> The above codegen text is included in the final message for AM to wave 
> goodbye to RM, while it ends up crashing the rm'sĀ ZKRMStateStore forĀ 
> YARN-6125 not covering the unregisterApplicationMaster's message truncation. 
> We also create an Jira on YARN Side 
> https://issues.apache.org/jira/browse/YARN-8691 
> Although SPARK-18016 fixed already, there are maybe other uncaught exceptions 
> will cause this problem. I guess that we should limit the error message's 
> size sent to RM while unregistering AM .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to