cekicbaris commented on pull request #167: URL: https://github.com/apache/incubator-livy/pull/167#issuecomment-698866352
> @cekicbaris , would be nice to see Spark Driver logs during this failure, I believe it might be related to Livy <-> Spark Driver communication. Also might be the networking is not very stable in your env, not sure if Livy does retries. @jahstreet Do you mind to give some advice to check the networking? It is running on AWS EKS on kubernetes 1.15 and in `livy` namespace. `livy` service accounts has enough auth. One more thing, if the session time'd out , the interactive session is deleted from livy but driver pods and executor pods are still running. But if I delete the session with a DELETE request to REST API, then it also deletes the pods. Here is a time'd out session log. ``` 2020-09-23T07:19:29.853571515Z 2020-09-23 07:19:29 INFO InteractiveSessionManager:39 - Deleting InteractiveSession 2 because it was inactive for more than 3600000.0 ms. 2020-09-23T07:19:29.853690768Z 2020-09-23 07:19:29 INFO InteractiveSession:39 - Stopping InteractiveSession 2... 2020-09-23T07:19:29.994715243Z 2020-09-23 07:19:29 WARN RpcDispatcher:191 - [ClientProtocol] Closing RPC channel with 1 outstanding RPCs. 2020-09-23T07:19:30.013570228Z 2020-09-23 07:19:30 INFO InteractiveSession:39 - Stopped InteractiveSession 2. 2020-09-23T07:19:39.119654462Z 2020-09-23 07:19:39 ERROR SparkKubernetesApp:56 - Unknown Kubernetes state unknown for app with tag livy-session-2-dfceC0pO. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
