It seems your driver is getting flooded by those many executors and hence
it gets timeout. There are some configuration options like
spark.akka.timeout etc, you could try playing with those. More information
will be available here:
http://spark.apache.org/docs/latest/configuration.html
Thanks
Best Regards
On Mon, Mar 23, 2015 at 9:46 AM, Tianshuo Deng td...@twitter.com.invalid
wrote:
Hi, spark users.
When running a spark application with lots of executors(300+), I see
following failures:
java.net.SocketTimeoutException: Read timed out at
java.net.SocketInputStream.socketRead0(Native Method) at
java.net.SocketInputStream.read(SocketInputStream.java:152) at
java.net.SocketInputStream.read(SocketInputStream.java:122) at
java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at
java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at
java.io.BufferedInputStream.read(BufferedInputStream.java:334) at
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690) at
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583) at
org.apache.spark.util.Utils$.fetchFile(Utils.scala:421) at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
When I reduce the number of executors, the spark app runs fine. From the
stack trace, it looks like that multiple executors requesting downloading
dependencies at the same time is causing driver to timeout?
Anyone experienced similar issues or has any suggestions?
Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org