Hi, I have been trying to use spark for the processing I need to do in some logs, and I have found several difficulties during the process. Most of them I could overcome them, but I am really stuck in the last one.
I would really like to know how spark is supposed to be deployed. For now, I have a ssh key in the master that can login in any worker. start-master.sh and start-slaves.sh work. According to the docs, I crafted the following command: ~/projects/bigdata/spark/spark/bin/spark-submit --py-files /home/javier/projects/bigdata/bdml/dist/bdml-0.0.1.zip --master='spark:// 10.0.0.71:7077' ml/spark_pipeline.py /srv/bdml/raw2json/json-logs.gz First, when I tried to deploy my project, it was an impossible quest. I was all the time getting module import errors: Traceback (most recent call last): File "/home/javier/projects/bigdata/bdml/ml/spark_pipeline.py", line 10, in <module> from .files import get_interesting_files I tried everything, but there was a moment when I had to hop into scala code to trace that error. Therefore I just merged all the functions of the project in one file. Then I started to get the following error: : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.0.0.73): org.apache.spark.api.python.PythonExce ption: Traceback (most recent call last): File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.7 than that in driver 3.4, PySpark cannot run with different minor versions I have specified #!/usr/bin/env python3 in the top of the file, and my spark-env.sh on each worker contains the following lines. SPARK_MASTER_IP=10.0.0.71 export PYSPARK_PYTHON=python3.4 PYSPARK_PYTHON=python3.4 export PYTHONHASHSEED=123 PYTHONHASHSEED=123 I had to specify the PYTHONHASHSEED because it wasn't propagating to the workers. I hope you can help me, [image: Fon] <http://www.fon.com/>Javier Domingo CansinoResearch & Development Engineer+34 946545847Skype: javier.domingo.fonAll information in this email is confidential <http://corp.fon.com/legal/email-disclaimer>