Hi all, I am running into a problem that once in a while my job is giving me the following exception(s): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.r.RRunner.compute(RRunner.scala:77) at org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:436) at org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:418) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/10/16 08:47:18:388 INFO CoarseGrainedExecutorBackend: Got assigned task 2059 18/10/16 08:47:18:388 INFO Executor: Running task 22.0 in stage 21.0 (TID 2059) 18/10/16 08:47:18:391 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1 blocks 18/10/16 08:47:18:391 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 18/10/16 08:47:18:394 ERROR Executor: Exception in task 22.0 in stage 21.0 (TID 2059) java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
It does not happen all the time, but after 15 times or so. I am using Apache Livy for submitting R scripts to the cluster which is running on CentOS 7.5 with Spark Version 2.2.1 and R 3.5.1 I am using another system with an older CentOS version 6.6 and R 3.2.2 which is running very stable without any problems and using the same R script. Is it possible that this R version is somehow not compatible with spark 2.2.1? Thanks Thijs