Hi, spark users.
When running a spark application with lots of executors(300+), I see following
failures:
java.net.SocketTimeoutException: Read timed out at
java.net.SocketInputStream.socketRead0(Native Method) at
java.net.SocketInputStream.read(SocketInputStream.java:152) at
java.net.SocketInputStream.read(SocketInputStream.java:122) at
java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at
java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at
java.io.BufferedInputStream.read(BufferedInputStream.java:334) at
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690) at
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583) at
org.apache.spark.util.Utils$.fetchFile(Utils.scala:421) at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at
scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
When I reduce the number of executors, the spark app runs fine. From the stack
trace, it looks like that multiple executors requesting downloading
dependencies at the same time is causing driver to timeout?
Anyone experienced similar issues or has any suggestions?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org