shockdm edited a comment on pull request #29496:
URL: https://github.com/apache/spark/pull/29496#issuecomment-685064485
@jkleckner Hi Jim, tried your fix and ran into a strange issue that makes
the spark-submit quit immediately, with driver proceeding:
```
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:990)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:948)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at okio.Okio$2.read(Okio.java:140)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
at okio.RealBufferedSource.request(RealBufferedSource.java:68)
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
at
okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
at
okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
Applying patch on top of 2.4.6.
Edit:
It seems that the current code in the PR is set to never wait on the
application completion, specifically:
```
} else {
logInfo(s"Deployed Spark application ${appId} with submission ID $sId
into Kubernetes")
// Always act like the application has completed since we don't want to
wait for app completion
true
}
```
when
```
if (hasCompleted()) {
```
So if the application hasn't completed - this will result in the terminate,
since the current code just quits and does not wait. Not sure if this is the
intention... But in the current `master`, instead of simply checking
`hasCompleted`, the check is
```
if (conf.get(WAIT_FOR_APP_COMPLETION)) {
```
Not sure if change is intentional or not, we likely want to check for that
flag, unless its not present in `2.4.6` stream. But anyways, the error I've
posted above is caused by not waiting for the app to complete.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]