I added overkill high timeouts to the OkHttpClient.Builder() in
RetrofitClientFactory.scala and I don't seem to be timing out anymore.

val okHttpClientBuilder = new OkHttpClient.Builder()
      .dispatcher(dispatcher)
      .proxy(resolvedProxy)
      .connectTimeout(120, TimeUnit.SECONDS)
      .writeTimeout(120, TimeUnit.SECONDS)
      .readTimeout(120, TimeUnit.SECONDS)

-Jenna

On Tue, Mar 27, 2018 at 10:48 AM, Jenna Hoole <jenna.ho...@gmail.com> wrote:

> So I'm running into an issue with my resource staging server that's
> producing a stacktrace like Issue 342
> <https://github.com/apache-spark-on-k8s/spark/issues/342>, but I don't
> think for the same reasons. What's happening is that every time after I
> start up a resource staging server, the first job submitted that uses it
> will fail with a java.net.SocketTimeoutException: timeout, and then every
> subsequent job will run perfectly. Including with different jars and
> different users. It's only ever the first job that fails and it always
> fails. I know I'm also running into Issue 577
> <https://github.com/apache-spark-on-k8s/spark/issues/577> in that it
> takes about three minutes before the resource staging server is accessible,
> but I'm still failing waiting over ten minutes or in one case overnight.
> And I'm just using the examples jar, so it's not a super large jar like in
> Issue 342.
>
> This isn't great for our CI process, so has anyone seen anything like this
> before or know how to increase the timeout if it just takes a while on
> initial contact? Using spark.network.timeout has no effect.
>
> [jhoole@nid00006 spark]$ kubectl get pods | grep jhoole-spark
>
> jhoole-spark-resource-staging-server-64666675c8-w5cdm   1/1       Running
>     0          13m
>
> [jhoole@nid00006 spark]$ kubectl get svc | grep jhoole-spark
>
> jhoole-spark-resource-staging-service               NodePort    10.96.143.55
>   <none>        10000:30622/TCP     13m
>
> [jhoole@nid00006 spark]$ bin/spark-submit --class
> org.apache.spark.examples.SparkPi --conf spark.app.name=spark-pi --conf
> spark.kubernetes.resourceStagingServer.uri=http://192.168.0.1:30622
> ./examples/target/scala-2.11/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar
>
> 2018-03-27 12:30:13 WARN  NativeCodeLoader:62 - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
>
> 2018-03-27 12:30:13 INFO  UserGroupInformation:966 - Login successful for
> user jhoole@local using keytab file /security/secrets/jhoole.keytab
>
> 2018-03-27 12:30:14 INFO  HadoopStepsOrchestrator:54 - Hadoop Conf
> directory: /etc/hadoop/conf
>
> 2018-03-27 12:30:14 INFO  SecurityManager:54 - Changing view acls to:
> jhoole
>
> 2018-03-27 12:30:14 INFO  SecurityManager:54 - Changing modify acls to:
> jhoole
>
> 2018-03-27 12:30:14 INFO  SecurityManager:54 - Changing view acls groups
> to:
>
> 2018-03-27 12:30:14 INFO  SecurityManager:54 - Changing modify acls
> groups to:
>
> 2018-03-27 12:30:14 INFO  SecurityManager:54 - SecurityManager:
> authentication disabled; ui acls disabled; users  with view permissions:
> Set(jhoole); groups with view permissions: Set(); users  with modify
> permissions: Set(jhoole); groups with modify permissions: Set()
>
> Exception in thread "main" java.net.SocketTimeoutException: timeout
>
> at okio.Okio$4.newTimeoutException(Okio.java:230)
>
> at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
>
> at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
>
> at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
>
> at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
>
> at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
>
> at okhttp3.internal.http1.Http1Codec.readResponseHeaders(
> Http1Codec.java:189)
>
> at okhttp3.internal.http.CallServerInterceptor.intercept(
> CallServerInterceptor.java:75)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:92)
>
> at okhttp3.internal.connection.ConnectInterceptor.intercept(
> ConnectInterceptor.java:45)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:92)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:67)
>
> at okhttp3.internal.cache.CacheInterceptor.intercept(
> CacheInterceptor.java:93)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:92)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:67)
>
> at okhttp3.internal.http.BridgeInterceptor.intercept(
> BridgeInterceptor.java:93)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:92)
>
> at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(
> RetryAndFollowUpInterceptor.java:120)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:92)
>
> at okhttp3.internal.http.RealInterceptorChain.proceed(
> RealInterceptorChain.java:67)
>
> at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
>
> at okhttp3.RealCall.execute(RealCall.java:69)
>
> at retrofit2.OkHttpCall.execute(OkHttpCall.java:174)
>
> at org.apache.spark.deploy.k8s.submit.SubmittedDependencyUploaderImp
> l.getTypedResponseResult(SubmittedDependencyUploaderImpl.scala:101)
>
> at org.apache.spark.deploy.k8s.submit.SubmittedDependencyUploaderImp
> l.doUpload(SubmittedDependencyUploaderImpl.scala:97)
>
> at org.apache.spark.deploy.k8s.submit.SubmittedDependencyUploaderImp
> l.uploadJars(SubmittedDependencyUploaderImpl.scala:70)
>
> at org.apache.spark.deploy.k8s.submit.submitsteps.initcontainer.
> SubmittedResourcesInitContainerConfigurationStep.configureInitContainer(
> SubmittedResourcesInitContainerConfigurationStep.scala:48)
>
> at org.apache.spark.deploy.k8s.submit.submitsteps.
> InitContainerBootstrapStep$$anonfun$configureDriver$1.apply(
> InitContainerBootstrapStep.scala:43)
>
> at org.apache.spark.deploy.k8s.submit.submitsteps.
> InitContainerBootstrapStep$$anonfun$configureDriver$1.apply(
> InitContainerBootstrapStep.scala:42)
>
> at scala.collection.immutable.List.foreach(List.scala:381)
>
> at org.apache.spark.deploy.k8s.submit.submitsteps.
> InitContainerBootstrapStep.configureDriver(InitContainerBootstrapStep.
> scala:42)
>
> at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.
> apply(Client.scala:102)
>
> at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.
> apply(Client.scala:101)
>
> at scala.collection.immutable.List.foreach(List.scala:381)
>
> at org.apache.spark.deploy.k8s.submit.Client.run(Client.scala:101)
>
> at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.
> apply(Client.scala:200)
>
> at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$5.
> apply(Client.scala:193)
>
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2551)
>
> at org.apache.spark.deploy.k8s.submit.Client$.run(Client.scala:193)
>
> at org.apache.spark.deploy.k8s.submit.Client$.main(Client.scala:213)
>
> at org.apache.spark.deploy.k8s.submit.Client.main(Client.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:786)
>
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Caused by: java.net.SocketException: Socket closed
>
> at java.net.SocketInputStream.read(SocketInputStream.java:204)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
>
> at okio.Okio$2.read(Okio.java:139)
>
> at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
>
> ... 47 more
>
> 2018-03-27 12:30:24 INFO  ShutdownHookManager:54 - Shutdown hook called
>
> 2018-03-27 12:30:24 INFO  ShutdownHookManager:54 - Deleting directory
> /tmp/uploaded-jars-4c7ca1cf-31d6-4dba-9203-c9a6f1cd4099
>
> Thanks,
> Jenna
>

Reply via email to