Hm, one thing to see is whether the same port appears many times (1315905645).
The way pyspark works today is that the JVM reads the port from the stdout
of the python process. If there is some interference in output from the
python side (e.g. any print statements, exception messages), then the Java
side will think that it's actually a port even when it's not.

I'm not sure why it fails sometimes but not others, but 2/3 of the time is
a lot...

2015-06-19 14:57 GMT-07:00 John Meehan <meeh...@dls.net>:

> Has anyone encountered this “port out of range” error when launching
> PySpark jobs on YARN?  It is sporadic (e.g. 2/3 jobs get this error).
>
> LOG:
>
> 15/06/19 11:49:44 INFO scheduler.TaskSetManager: Lost task 0.3 in stage
> 39.0 (TID 211) on executor xxx.xxx.xxx.com:
> java.lang.IllegalArgumentException (port out of range:1315905645)
> [duplicate 7]
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> 15/06/19 11:49:44 INFO cluster.YarnScheduler: Removed TaskSet 39.0, whose
> tasks have all completed, from pool
>  File "/home/john/spark-1.4.0/python/pyspark/rdd.py", line 745, in collect
>    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>  File
> "/home/john/spark-1.4.0/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>  File
> "/home/john/spark-1.4.0/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError15/06/19 11:49:44 INFO
> storage.BlockManagerInfo: Removed broadcast_38_piece0 on
> 17.134.160.35:47455 in memory (size: 2.2 KB, free: 265.4 MB)
> : An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 1 in stage 39.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> 39.0 (TID 210, xxx.xxx.xxx.com): java.lang.IllegalArgumentException: port
> out of range:1315905645
> at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
> at java.net.InetSocketAddress.<init>(InetSocketAddress.java:185)
> at java.net.Socket.<init>(Socket.java:241)
> at
> org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
> at
> org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
> at
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
> at
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
> at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:130)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:73)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> * Spark 1.4.0 build:
>
> build/mvn -Pyarn -Phive -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.1.4
> -DskipTests clean package
>
> LAUNCH CMD:
>
> export HADOOP_CONF_DIR=/path/to/conf
> export PYSPARK_PYTHON=/path/to/python-2.7.2/bin/python
> ~/spark-1.4.0/bin/pyspark \
> --conf
> spark.yarn.jar=/home/john/spark-1.4.0/assembly/target/scala-2.10/spark-assembly-1.4.0-hadoop2.3.0-cdh5.1.4.jar
> \
> --master yarn-client \
> --num-executors 3 \
> --executor-cores 18 \
> --executor-memory 48g
>
> TEST JOB IN REPL:
>
> words = [‘hi’, ‘there’, ‘yo’, ‘baby’]
> wordsRdd = sc.parallelize(words)
> words.map(lambda x: (x,1)).collect()
>

Reply via email to