Hi,

thanks for your replies. I'm out of office now, so I will check it out on
Monday morning, but guess about serialization/deserialization looks
plausible.

Thanks,
Andrei


On Sat, Nov 16, 2013 at 11:11 AM, Jey Kottalam <[email protected]> wrote:

> Hi Andrei,
>
> Could you please post the stderr logfile from the failed executor? You can
> find this in the "work" subdirectory of the worker that had the failed
> task. You'll need the executor id to find the corresonding stderr file.
>
> Thanks,
> -Jey
>
>
> On Friday, November 15, 2013, Andrei wrote:
>
>> I have 2 Python modules/scripts - task.py and runner.py. First one
>> (task.py) is a little Spark job and works perfectly well by itself.
>> However, when called from runner.py with exactly the same arguments, it
>> fails with only useless message (both - in terminal and worker logs).
>>
>>     org.apache.spark.SparkException: Python worker exited unexpectedly
>> (crashed)
>>
>> Below there's code for both - task.py and runner.py:
>>
>> task.py
>> -----------
>>
>> #!/usr/bin/env pyspark
>> from __future__ import print_function
>> from pyspark import SparkContext
>>
>> def process(line):
>>     return line.strip()
>>
>> def main(spark_master, path):
>>     sc = SparkContext(spark_master, 'My Job')
>>     rdd = sc.textFile(path)
>>     rdd = rdd.map(process)     # this line causes troubles when called
>> from runner.py
>>     count = rdd.count()
>>     print(count)
>>
>> if __name__ == '__main__':
>>     main('spark://spark-master-host:7077',
>>             'hdfs://hdfs-namenode-host:8020/path/to/file.log')
>>
>>
>> runner.py
>> -------------
>>
>> #!/usr/bin/env pyspark
>>
>> import task
>>
>> if __name__ == '__main__':
>>     task.main('spark://spark-master-host:7077',
>>                    'hdfs://hdfs-namenode-host:8020/path/to/file.log')
>>
>>
>> -------------------------------------------------------------------------------------------
>>
>> So, what's the difference between calling PySpark-enabled script directly
>> and as Python module? What are good rules for writing multi-module Python
>> programs with Spark?
>>
>> Thanks,
>> Andrei
>>
>

Reply via email to