Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-27 Thread Zhiliang Zhu
Hi All,
Would some expert help me some about the issue...
I shall appreciate you kind help very much!
Thank you!   
Zhiliang  

 
 


 On Sunday, September 27, 2015 7:40 PM, Zhiliang Zhu 
 wrote:
   

 Hi Alexis, Gavin,
Thanks very much for your kind comment.My spark command is : 
spark-submit --class com.zyyx.spark.example.LinearRegression --master 
yarn-client LinearRegression.jar 

Both spark-shell and spark-submit will not run, all is hanging during the stage,
15/09/27 19:18:06 INFO yarn.Client: Application report for 
application_1440676456544_0727 (state: ACCEPTED)...
The more deeper error log under /hdfs/yarn/logs/:
15/09/27 19:10:37 INFO util.Utils: Successfully started service 'sparkYarnAM' 
on port 53882.
15/09/27 19:10:37 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
15/09/27 19:10:37 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
127.0.0.1:39581, retrying ...
15/09/27 19:10:37 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
127.0.0.1:39581, retrying ... 

For the all machine nodes, I just installed hadoop and spark, with same path & 
file & configuration, and 
copied one of the hadoop & spark directory to the remote gateway machine, the 
all would be with same 
path & file name & configuration under different nodes.
In the link Running Spark on YARN - Spark 1.5.0 Documentation, there is some 
words as:Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory 
which contains the (client side) configuration files for the Hadoop 
cluster.These configs are used to write to HDFS and connect to the YARN 
ResourceManager. 

I do not exactly catch the first sentence.
hadoop version is 2.5.2, spark version is 1.4.1
The spark-env.sh setting,
export SCALA_HOME=/usr/lib/scala
export JAVA_HOME=/usr/java/jdk1.7.0_45
export R_HOME=/usr/lib/r
export HADOOP_HOME=/usr/lib/hadoop
export YARN_CONF_DIR=/usr/lib/hadoop/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_MASTER_IP=master01
#export SPARK_LOCAL_IP=master02
export SPARK_LOCAL_IP=localhost
export SPARK_LOCAL_DIRS=/data/spark_local_dir

Would you help point out what is wrong place I made...I must show sincere 
appreciation towards your help.
Best Regards,Zhiliang

On Saturday, September 26, 2015 2:27 PM, Gavin Yue  
wrote:
  

 

 It is working, We are doing the same thing everyday.  But the remote server 
needs to able to talk with ResourceManager. 

If you are using Spark-submit,  your will also specify the hadoop conf 
directory in your Env variable. Spark would rely on that to locate where the 
cluster's resource manager is. 

I think this tutorial is pretty clear: 
http://spark.apache.org/docs/latest/running-on-yarn.html



On Fri, Sep 25, 2015 at 7:11 PM, Zhiliang Zhu  wrote:

Hi Yue,
Thanks very much for your kind reply.
I would like to submit spark job remotely on another machine outside the 
cluster,and the job will run on yarn, similar as hadoop job is already done, 
could youconfirm it could exactly work for spark...
Do you mean that I would print those variables on linux command side?
Best Regards,Zhiliang

 


 On Saturday, September 26, 2015 10:07 AM, Gavin Yue 
 wrote:
   

 Print out your env variables and check first 

Sent from my iPhone
On Sep 25, 2015, at 18:43, Zhiliang Zhu  wrote:


Hi All,
I would like to submit spark job on some another remote machine outside the 
cluster,I also copied hadoop/spark conf files under the remote machine, then 
hadoopjob would be submitted, but spark job would not.
In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,or 
for some other reasons...
This issue is urgent for me, would some expert provide some help about this 
problem...
I will show sincere appreciation towards your help.
Thank you!Best Regards,Zhiliang



 On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu 
 wrote:
   

 Hi all,
The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just 
set asexport  SPARK_LOCAL_IP=localhost    #or set as the specific node ip on 
the specific spark install directory 

It will work well to submit spark job on master node of cluster, however, it 
will fail by way of some gateway machine remotely.
The gateway machine is already configed, it works well to submit hadoop job.It 
is set as:
export SCALA_HOME=/usr/lib/scala
export JAVA_HOME=/usr/java/jdk1.7.0_45
export R_HOME=/usr/lib/r
export HADOOP_HOME=/usr/lib/hadoop
export YARN_CONF_DIR=/usr/lib/hadoop/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_MASTER_IP=master01
#export SPARK_LOCAL_IP=master01  #if no SPARK_LOCAL_IP is set, SparkContext 
will not start
export SPARK_LOCAL_IP=localhost #if localhost is set, SparkContext is 
started, but failed later
export SPARK_LOCAL_DIRS=/data/spark_local_dir
...

The error messages:
15/09/25 

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-26 Thread Gavin Yue
It is working, We are doing the same thing everyday.  But the remote server
needs to able to talk with ResourceManager.

If you are using Spark-submit,  your will also specify the hadoop conf
directory in your Env variable. Spark would rely on that to locate where
the cluster's resource manager is.

I think this tutorial is pretty clear:
http://spark.apache.org/docs/latest/running-on-yarn.html



On Fri, Sep 25, 2015 at 7:11 PM, Zhiliang Zhu  wrote:

> Hi Yue,
>
> Thanks very much for your kind reply.
>
> I would like to submit spark job remotely on another machine outside the
> cluster,
> and the job will run on yarn, similar as hadoop job is already done, could
> you
> confirm it could exactly work for spark...
>
> Do you mean that I would print those variables on linux command side?
>
> Best Regards,
> Zhiliang
>
>
>
>
>
> On Saturday, September 26, 2015 10:07 AM, Gavin Yue <
> yue.yuany...@gmail.com> wrote:
>
>
> Print out your env variables and check first
>
> Sent from my iPhone
>
> On Sep 25, 2015, at 18:43, Zhiliang Zhu  > wrote:
>
> Hi All,
>
> I would like to submit spark job on some another remote machine outside
> the cluster,
> I also copied hadoop/spark conf files under the remote machine, then hadoop
> job would be submitted, but spark job would not.
>
> In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,
> or for some other reasons...
>
> This issue is urgent for me, would some expert provide some help about
> this problem...
>
> I will show sincere appreciation towards your help.
>
> Thank you!
> Best Regards,
> Zhiliang
>
>
>
>
> On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu <
> zchl.j...@yahoo.com.INVALID > wrote:
>
>
> Hi all,
>
> The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or
> just set as
> export  SPARK_LOCAL_IP=localhost#or set as the specific node ip on the
> specific spark install directory
>
> It will work well to submit spark job on master node of cluster, however,
> it will fail by way of some gateway machine remotely.
>
> The gateway machine is already configed, it works well to submit hadoop
> job.
> It is set as:
> export SCALA_HOME=/usr/lib/scala
> export JAVA_HOME=/usr/java/jdk1.7.0_45
> export R_HOME=/usr/lib/r
> export HADOOP_HOME=/usr/lib/hadoop
> export YARN_CONF_DIR=/usr/lib/hadoop/etc/hadoop
> export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
>
> export SPARK_MASTER_IP=master01
> #export SPARK_LOCAL_IP=master01  #if no SPARK_LOCAL_IP is set,
> SparkContext will not start
> export SPARK_LOCAL_IP=localhost #if localhost is set, SparkContext is
> started, but failed later
> export SPARK_LOCAL_DIRS=/data/spark_local_dir
> ...
>
> The error messages:
> 15/09/25 19:07:12 INFO util.Utils: Successfully started service
> 'sparkYarnAM' on port 48133.
> 15/09/25 19:07:12 INFO yarn.ApplicationMaster: Waiting for Spark driver to
> be reachable.
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to
> driver at 127.0.0.1:35706, retrying ...
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to
> driver at 127.0.0.1:35706, retrying ...
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to
> driver at 127.0.0.1:35706, retrying ...
>
>  I shall sincerely appreciate your kind help very much!
> Zhiliang
>
>
>
>
>
>
>


Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Gavin Yue
Print out your env variables and check first 

Sent from my iPhone

> On Sep 25, 2015, at 18:43, Zhiliang Zhu  wrote:
> 
> Hi All,
> 
> I would like to submit spark job on some another remote machine outside the 
> cluster,
> I also copied hadoop/spark conf files under the remote machine, then hadoop
> job would be submitted, but spark job would not.
> 
> In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,
> or for some other reasons...
> 
> This issue is urgent for me, would some expert provide some help about this 
> problem...
> 
> I will show sincere appreciation towards your help.
> 
> Thank you!
> Best Regards,
> Zhiliang
> 
> 
> 
> 
> On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu 
>  wrote:
> 
> 
> Hi all,
> 
> The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or 
> just set as
> export  SPARK_LOCAL_IP=localhost#or set as the specific node ip on the 
> specific spark install directory 
> 
> It will work well to submit spark job on master node of cluster, however, it 
> will fail by way of some gateway machine remotely.
> 
> The gateway machine is already configed, it works well to submit hadoop job.
> It is set as:
> export SCALA_HOME=/usr/lib/scala
> export JAVA_HOME=/usr/java/jdk1.7.0_45
> export R_HOME=/usr/lib/r
> export HADOOP_HOME=/usr/lib/hadoop
> export YARN_CONF_DIR=/usr/lib/hadoop/etc/hadoop
> export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
> 
> export SPARK_MASTER_IP=master01
> #export SPARK_LOCAL_IP=master01  #if no SPARK_LOCAL_IP is set, SparkContext 
> will not start
> export SPARK_LOCAL_IP=localhost #if localhost is set, SparkContext is 
> started, but failed later
> export SPARK_LOCAL_DIRS=/data/spark_local_dir
> ...
> 
> The error messages:
> 15/09/25 19:07:12 INFO util.Utils: Successfully started service 'sparkYarnAM' 
> on port 48133.
> 15/09/25 19:07:12 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 127.0.0.1:35706, retrying ...
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 127.0.0.1:35706, retrying ...
> 15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 127.0.0.1:35706, retrying ...
> 
>  I shall sincerely appreciate your kind help very much!
> Zhiliang
> 
> 
> 
> 


Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
Hi Yue,
Thanks very much for your kind reply.
I would like to submit spark job remotely on another machine outside the 
cluster,and the job will run on yarn, similar as hadoop job is already done, 
could youconfirm it could exactly work for spark...
Do you mean that I would print those variables on linux command side?
Best Regards,Zhiliang

 


 On Saturday, September 26, 2015 10:07 AM, Gavin Yue 
 wrote:
   

 Print out your env variables and check first 

Sent from my iPhone
On Sep 25, 2015, at 18:43, Zhiliang Zhu  wrote:


Hi All,
I would like to submit spark job on some another remote machine outside the 
cluster,I also copied hadoop/spark conf files under the remote machine, then 
hadoopjob would be submitted, but spark job would not.
In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,or 
for some other reasons...
This issue is urgent for me, would some expert provide some help about this 
problem...
I will show sincere appreciation towards your help.
Thank you!Best Regards,Zhiliang



 On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu 
 wrote:
   

 Hi all,
The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just 
set asexport  SPARK_LOCAL_IP=localhost    #or set as the specific node ip on 
the specific spark install directory 

It will work well to submit spark job on master node of cluster, however, it 
will fail by way of some gateway machine remotely.
The gateway machine is already configed, it works well to submit hadoop job.It 
is set as:
export SCALA_HOME=/usr/lib/scala
export JAVA_HOME=/usr/java/jdk1.7.0_45
export R_HOME=/usr/lib/r
export HADOOP_HOME=/usr/lib/hadoop
export YARN_CONF_DIR=/usr/lib/hadoop/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_MASTER_IP=master01
#export SPARK_LOCAL_IP=master01  #if no SPARK_LOCAL_IP is set, SparkContext 
will not start
export SPARK_LOCAL_IP=localhost #if localhost is set, SparkContext is 
started, but failed later
export SPARK_LOCAL_DIRS=/data/spark_local_dir
...

The error messages:
15/09/25 19:07:12 INFO util.Utils: Successfully started service 'sparkYarnAM' 
on port 48133.
15/09/25 19:07:12 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
reachable.
15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
127.0.0.1:35706, retrying ...
15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
127.0.0.1:35706, retrying ...
15/09/25 19:07:12 ERROR yarn.ApplicationMaster: Failed to connect to driver at 
127.0.0.1:35706, retrying ...

 I shall sincerely appreciate your kind help very much!Zhiliang