Re: Run a self-contained Spark app on a Spark standalone cluster

2016-04-16 Thread Ted Yu
Kevin:
Can you describe how you got over the Metadata fetch exception ?

> On Apr 16, 2016, at 9:41 AM, Kevin Eid  wrote:
> 
> One last email to announce that I've fixed all of the issues. Don't hesitate 
> to contact me if you encounter the same. I'd be happy to help.
> 
> Regards,
> Kevin
> 
>> On 14 Apr 2016 12:39 p.m., "Kevin Eid"  wrote:
>> Hi all, 
>> 
>> I managed to copy my .py files from local to the cluster using SCP . And I 
>> managed to run my Spark app on the cluster against a small dataset. 
>> 
>> However, when I iterate over a dataset of 5GB I got the followings: 
>> org.apache.spark.shuffle.MetadataFetchFailedException + please see the 
>> joined screenshots. 
>> 
>> I am deploying 3*m3.xlarge and using the following parameters while 
>> submitting the app: --executor-memory 50g --driver-memory 20g 
>> --executor-cores 4 --num-executors 3. 
>> 
>> Can you recommend other configurations (driver executors number memory) or 
>> do I have to deploy more and larger instances  in order to run my app on 5GB 
>> ? Or do I need to add more partitions while reading the file? 
>> 
>> Best, 
>> Kevin
>> 
>>> On 12 April 2016 at 12:19, Sun, Rui  wrote:
>>> Which py file is your main file (primary py file)? Zip the other two py 
>>> files. Leave the main py file alone. Don't copy them to S3 because it seems 
>>> that only local primary and additional py files are supported.
>>> 
>>> ./bin/spark-submit --master spark://... --py-files  
>>> 
>>> -Original Message-
>>> From: kevllino [mailto:kevin.e...@mail.dcu.ie]
>>> Sent: Tuesday, April 12, 2016 5:07 PM
>>> To: user@spark.apache.org
>>> Subject: Run a self-contained Spark app on a Spark standalone cluster
>>> 
>>> Hi,
>>> 
>>> I need to know how to run a self-contained Spark app  (3 python files) in a 
>>> Spark standalone cluster. Can I move the .py files to the cluster, or 
>>> should I store them locally, on HDFS or S3? I tried the following locally 
>>> and on S3 with a zip of my .py files as suggested  here 
>>>   :
>>> 
>>> ./bin/spark-submit --master
>>> spark://ec2-54-51-23-172.eu-west-1.compute.amazonaws.com:5080--py-files
>>> s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@mubucket//weather_predict.zip
>>> 
>>> But get: “Error: Must specify a primary resource (JAR or Python file)”
>>> 
>>> Best,
>>> Kevin
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
>> 
>> 
>> -- 
>> Kevin EID 
>> M.Sc. in Computing, Data Analytics
>> 
>> 


Re: Run a self-contained Spark app on a Spark standalone cluster

2016-04-16 Thread Kevin Eid
One last email to announce that I've fixed all of the issues. Don't
hesitate to contact me if you encounter the same. I'd be happy to help.

Regards,
Kevin
On 14 Apr 2016 12:39 p.m., "Kevin Eid"  wrote:

> Hi all,
>
> I managed to copy my .py files from local to the cluster using SCP . And I
> managed to run my Spark app on the cluster against a small dataset.
>
> However, when I iterate over a dataset of 5GB I got the followings:
> org.apache.spark.shuffle.MetadataFetchFailedException + please see the
> joined screenshots.
>
> I am deploying 3*m3.xlarge and using the following parameters while
> submitting the app: --executor-memory 50g --driver-memory 20g
> --executor-cores 4 --num-executors 3.
>
> Can you recommend other configurations (driver executors number memory) or
> do I have to deploy more and larger instances  in order to run my app on
> 5GB ? Or do I need to add more partitions while reading the file?
>
> Best,
> Kevin
>
> On 12 April 2016 at 12:19, Sun, Rui  wrote:
>
>> Which py file is your main file (primary py file)? Zip the other two py
>> files. Leave the main py file alone. Don't copy them to S3 because it seems
>> that only local primary and additional py files are supported.
>>
>> ./bin/spark-submit --master spark://... --py-files  > file>
>>
>> -Original Message-
>> From: kevllino [mailto:kevin.e...@mail.dcu.ie]
>> Sent: Tuesday, April 12, 2016 5:07 PM
>> To: user@spark.apache.org
>> Subject: Run a self-contained Spark app on a Spark standalone cluster
>>
>> Hi,
>>
>> I need to know how to run a self-contained Spark app  (3 python files) in
>> a Spark standalone cluster. Can I move the .py files to the cluster, or
>> should I store them locally, on HDFS or S3? I tried the following locally
>> and on S3 with a zip of my .py files as suggested  here <
>> http://spark.apache.org/docs/latest/submitting-applications.html>  :
>>
>> ./bin/spark-submit --master
>> spark://ec2-54-51-23-172.eu-west-1.compute.amazonaws.com:5080
>> --py-files
>> s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@mubucket
>> //weather_predict.zip
>>
>> But get: “Error: Must specify a primary resource (JAR or Python file)”
>>
>> Best,
>> Kevin
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Kevin EID
> M.Sc. in Computing, Data Analytics
> 
>
>


Re: Run a self-contained Spark app on a Spark standalone cluster

2016-04-12 Thread kevllino
Update: 
- I managed to login to the cluster 
- I want to use copy-dir to deploy my Python files to all nodes, I read I
need to copy them to /ephemeral/hdfs. But I don't know how to move them from
local to the cluster in HDFS?

Thanks in advance,
Kevin 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753p26761.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Run a self-contained Spark app on a Spark standalone cluster

2016-04-12 Thread Kevin Eid
Hi,

Thanks for your emails, I tried running your command but it returned: "No
such file or directory".
So I definitely need to move my local .py files to the cluster, I tried
login but (before sshing) but could not find the master:
./spark-ec2 -k key -i key.pem  login weather-cluster
- then sshing, the copy-dir is located in the spark-ec2 but to, replicate
my files across all nodes I need to get them into the root folder in the
spark EC2 cluster:
./spark-ec2/copy-dir /root/spark/myfiles

I used that: http://spark.apache.org/docs/latest/ec2-scripts.html.

Do you have any suggestions about how to move those files from local to the
cluster?
Thanks in advance,
Kevin

On 12 April 2016 at 12:19, Sun, Rui  wrote:

> Which py file is your main file (primary py file)? Zip the other two py
> files. Leave the main py file alone. Don't copy them to S3 because it seems
> that only local primary and additional py files are supported.
>
> ./bin/spark-submit --master spark://... --py-files   file>
>
> -Original Message-
> From: kevllino [mailto:kevin.e...@mail.dcu.ie]
> Sent: Tuesday, April 12, 2016 5:07 PM
> To: user@spark.apache.org
> Subject: Run a self-contained Spark app on a Spark standalone cluster
>
> Hi,
>
> I need to know how to run a self-contained Spark app  (3 python files) in
> a Spark standalone cluster. Can I move the .py files to the cluster, or
> should I store them locally, on HDFS or S3? I tried the following locally
> and on S3 with a zip of my .py files as suggested  here <
> http://spark.apache.org/docs/latest/submitting-applications.html>  :
>
> ./bin/spark-submit --master
> spark://ec2-54-51-23-172.eu-west-1.compute.amazonaws.com:5080
> --py-files
> s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@mubucket
> //weather_predict.zip
>
> But get: “Error: Must specify a primary resource (JAR or Python file)”
>
> Best,
> Kevin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Kevin EID
M.Sc. in Computing, Data Analytics



RE: Run a self-contained Spark app on a Spark standalone cluster

2016-04-12 Thread Sun, Rui
Which py file is your main file (primary py file)? Zip the other two py files. 
Leave the main py file alone. Don't copy them to S3 because it seems that only 
local primary and additional py files are supported.

./bin/spark-submit --master spark://... --py-files  

-Original Message-
From: kevllino [mailto:kevin.e...@mail.dcu.ie] 
Sent: Tuesday, April 12, 2016 5:07 PM
To: user@spark.apache.org
Subject: Run a self-contained Spark app on a Spark standalone cluster

Hi, 

I need to know how to run a self-contained Spark app  (3 python files) in a 
Spark standalone cluster. Can I move the .py files to the cluster, or should I 
store them locally, on HDFS or S3? I tried the following locally and on S3 with 
a zip of my .py files as suggested  here 
  : 

./bin/spark-submit --master
spark://ec2-54-51-23-172.eu-west-1.compute.amazonaws.com:5080--py-files
s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@mubucket//weather_predict.zip

But get: “Error: Must specify a primary resource (JAR or Python file)”

Best,
Kevin 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Run a self-contained Spark app on a Spark standalone cluster

2016-04-12 Thread Alonso Isidoro Roman
I don't know how to do it with python, but scala has a plugin named
sbt-pack that creates an auto contained unix command with your code, no
need to use spark-submit. It should be out there something similar to this
tool.



Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"


2016-04-12 11:06 GMT+02:00 kevllino :

> Hi,
>
> I need to know how to run a self-contained Spark app  (3 python files) in a
> Spark standalone cluster. Can I move the .py files to the cluster, or
> should
> I store them locally, on HDFS or S3? I tried the following locally and on
> S3
> with a zip of my .py files as suggested  here
>   :
>
> ./bin/spark-submit --master
> spark://ec2-54-51-23-172.eu-west-1.compute.amazonaws.com:5080
> --py-files
> s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@mubucket
> //weather_predict.zip
>
> But get: “Error: Must specify a primary resource (JAR or Python file)”
>
> Best,
> Kevin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Run-a-self-contained-Spark-app-on-a-Spark-standalone-cluster-tp26753.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>