Re: PySpark script works itself, but fails when called from other script

Jey Kottalam Sat, 16 Nov 2013 00:12:25 -0800

Hi Andrei,

Could you please post the stderr logfile from the failed executor? You can
find this in the "work" subdirectory of the worker that had the failed
task. You'll need the executor id to find the corresonding stderr file.


Thanks,
-Jey

On Friday, November 15, 2013, Andrei wrote:

> I have 2 Python modules/scripts - task.py and runner.py. First one
> (task.py) is a little Spark job and works perfectly well by itself.
> However, when called from runner.py with exactly the same arguments, it
> fails with only useless message (both - in terminal and worker logs).
>
>     org.apache.spark.SparkException: Python worker exited unexpectedly
> (crashed)
>
> Below there's code for both - task.py and runner.py:
>
> task.py
> -----------
>
> #!/usr/bin/env pyspark
> from __future__ import print_function
> from pyspark import SparkContext
>
> def process(line):
>     return line.strip()
>
> def main(spark_master, path):
>     sc = SparkContext(spark_master, 'My Job')
>     rdd = sc.textFile(path)
>     rdd = rdd.map(process)     # this line causes troubles when called
> from runner.py
>     count = rdd.count()
>     print(count)
>
> if __name__ == '__main__':
>     main('spark://spark-master-host:7077',
>             'hdfs://hdfs-namenode-host:8020/path/to/file.log')
>
>
> runner.py
> -------------
>
> #!/usr/bin/env pyspark
>
> import task
>
> if __name__ == '__main__':
>     task.main('spark://spark-master-host:7077',
>                    'hdfs://hdfs-namenode-host:8020/path/to/file.log')
>
>
> -------------------------------------------------------------------------------------------
>
> So, what's the difference between calling PySpark-enabled script directly
> and as Python module? What are good rules for writing multi-module Python
> programs with Spark?
>
> Thanks,
> Andrei
>

Re: PySpark script works itself, but fails when called from other script

Reply via email to