cxzl25 commented on issue #25078: [SPARK-28305][YARN] Request GetExecutorLossReason to use a smaller timeout parameter URL: https://github.com/apache/spark/pull/25078#issuecomment-509289198 AM LOG: 19/07/08 16:56:48 [dispatcher-event-loop-0] INFO YarnAllocator: add executor 951 to pendingLossReasonRequests for get the loss reason 19/07/08 16:58:48 [dispatcher-event-loop-26] INFO ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 19/07/08 16:58:48 [dispatcher-event-loop-26] INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 Driver LOG: 19/07/08 16:58:48,476 [rpc-server-3-3] ERROR TransportChannelHandler: Connection to /xx.xx.xx.xx:19398 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong. 19/07/08 16:58:48,476 [rpc-server-3-3] ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /xx.xx.xx.xx:19398 is closed 19/07/08 16:58:48,510 [rpc-server-3-3] WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connection from /xx.xx.xx.xx:19398 closed 19/07/08 16:58:48,516 [netty-rpc-env-timeout] WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 951 at RPC address xx.xx.xx.xx:49175, but got no response. Marking as slave lost. org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from null in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
