Re: Spark Job running on localhost on yarn cluster

2015-02-05 Thread Kostas Sakellis
Kundan,

So I think your configuration here is incorrect. We need to adjust memory
and #executors. So for your case you have:

Cluster setup
5 nodes
16gb RAM
8 cores.
The number of executors should be the total number of nodes in your cluster
- in your case 5. As for --num-executor-cores it should be total cores on
the machine - 1 for the AM. So for your you --num-executor-cores=7. On to
memory. When configuring memory you need to account for the memory overhead
that spark adds - default is 7% of executor memory. If yarn has a max of
14GB per nodemanager, and you set your executor-memory to 14GB, spark is
actually requesting requesting 1.07*14GB = 14.98GB. You should double check
your configuration but if all your yarn containers have a max of 14GB then
no executors should be launching since spark can't get the resources it's
asking for. Maybe you have 3 node managers configured with more memory?

For your setup the memory calculation is:1
executorMemoryGB * 1.07 = 14GB => 14GB/1.07 ~ 13GB.

Your command args should be something like: --master yarn-cluster
--num-executors 5 --num-executor-cores 7 --executor-memory 13g

As for the UI, where did you see 7.2GB? can you send a screen shot?

Hope this helps,
Kostas


On Thursday, February 5, 2015, kundan kumar  wrote:

> The problem got resolved after removing all the configuration files from
> all the slave nodes. Earlier we were running in the standalone mode and
> that lead to duplicating the configuration on all the slaves. Once that was
> done it ran as expected in cluster mode. Although performance is not up to
> the standalone mode.
>
> However, as compared to the standalone mode, spark on yarn runs very slow.
>
> I am running it as
>
> $SPARK_HOME/bin/spark-submit --class "EDDApp" --master yarn-cluster
> --num-executors 10 --executor-memory 14g
>  target/scala-2.10/edd-application_2.10-1.0.jar
>  hdfs://hm41:9000/user/hduser/newtrans.csv
>  hdfs://hm41:9000/user/hduser/trans-out
>
> We have a cluster of 5 nodes with each having 16GB RAM and 8 cores each.
> We have configured the minimum container size as 3GB and maximum as 14GB in
> yarn-site.xml. When submitting the job to yarn-cluster we supply number of
> executor = 10, memory of executor =14 GB. According to my understanding our
> job should be allocated 4 container of 14GB. But the spark UI shows only 3
> container of 7.2GB each.
>
> We are unable to ensure the container number and resources allocated to
> it. This causes detrimental performance when compared to the standalone
> mode.
>
>
>
>
> Regards,
> Kundan
>
> On Thu, Feb 5, 2015 at 12:49 PM, Felix C 
> wrote:
>
>>  Is YARN_CONF_DIR set?
>>
>> --- Original Message ---
>>
>> From: "Aniket Bhatnagar" 
>> Sent: February 4, 2015 6:16 AM
>> To: "kundan kumar" , "spark users" <
>> user@spark.apache.org>
>> Subject: Re: Spark Job running on localhost on yarn cluster
>>
>>  Have you set master in SparkConf/SparkContext in your code? Driver logs
>> show in which mode the spark job is running. Double check if the logs
>> mention local or yarn-cluster.
>> Also, what's the error that you are getting?
>>
>> On Wed, Feb 4, 2015, 6:13 PM kundan kumar  wrote:
>>
>> Hi,
>>
>>  I am trying to execute my code on a yarn cluster
>>
>>  The command which I am using is
>>
>>  $SPARK_HOME/bin/spark-submit --class "EDDApp"
>> target/scala-2.10/edd-application_2.10-1.0.jar --master yarn-cluster
>> --num-executors 3 --driver-memory 6g --executor-memory 7g 
>>
>>  But, I can see that this program is running only on the localhost.
>>
>>  Its able to read the file from hdfs.
>>
>>  I have tried this in standalone mode and it works fine.
>>
>>  Please suggest where is it going wrong.
>>
>>
>>  Regards,
>> Kundan
>>
>>
>


Re: Spark Job running on localhost on yarn cluster

2015-02-05 Thread kundan kumar
The problem got resolved after removing all the configuration files from
all the slave nodes. Earlier we were running in the standalone mode and
that lead to duplicating the configuration on all the slaves. Once that was
done it ran as expected in cluster mode. Although performance is not up to
the standalone mode.

However, as compared to the standalone mode, spark on yarn runs very slow.

I am running it as

$SPARK_HOME/bin/spark-submit --class "EDDApp" --master yarn-cluster
--num-executors 10 --executor-memory 14g
 target/scala-2.10/edd-application_2.10-1.0.jar
 hdfs://hm41:9000/user/hduser/newtrans.csv
 hdfs://hm41:9000/user/hduser/trans-out

We have a cluster of 5 nodes with each having 16GB RAM and 8 cores each. We
have configured the minimum container size as 3GB and maximum as 14GB in
yarn-site.xml. When submitting the job to yarn-cluster we supply number of
executor = 10, memory of executor =14 GB. According to my understanding our
job should be allocated 4 container of 14GB. But the spark UI shows only 3
container of 7.2GB each.

We are unable to ensure the container number and resources allocated to it.
This causes detrimental performance when compared to the standalone mode.




Regards,
Kundan

On Thu, Feb 5, 2015 at 12:49 PM, Felix C  wrote:

>  Is YARN_CONF_DIR set?
>
> --- Original Message ---
>
> From: "Aniket Bhatnagar" 
> Sent: February 4, 2015 6:16 AM
> To: "kundan kumar" , "spark users" <
> user@spark.apache.org>
> Subject: Re: Spark Job running on localhost on yarn cluster
>
>  Have you set master in SparkConf/SparkContext in your code? Driver logs
> show in which mode the spark job is running. Double check if the logs
> mention local or yarn-cluster.
> Also, what's the error that you are getting?
>
> On Wed, Feb 4, 2015, 6:13 PM kundan kumar  wrote:
>
> Hi,
>
>  I am trying to execute my code on a yarn cluster
>
>  The command which I am using is
>
>  $SPARK_HOME/bin/spark-submit --class "EDDApp"
> target/scala-2.10/edd-application_2.10-1.0.jar --master yarn-cluster
> --num-executors 3 --driver-memory 6g --executor-memory 7g 
>
>  But, I can see that this program is running only on the localhost.
>
>  Its able to read the file from hdfs.
>
>  I have tried this in standalone mode and it works fine.
>
>  Please suggest where is it going wrong.
>
>
>  Regards,
> Kundan
>
>


Re: Spark Job running on localhost on yarn cluster

2015-02-04 Thread Felix C
Is YARN_CONF_DIR set?

--- Original Message ---

From: "Aniket Bhatnagar" 
Sent: February 4, 2015 6:16 AM
To: "kundan kumar" , "spark users" 

Subject: Re: Spark Job running on localhost on yarn cluster

Have you set master in SparkConf/SparkContext in your code? Driver logs
show in which mode the spark job is running. Double check if the logs
mention local or yarn-cluster.
Also, what's the error that you are getting?

On Wed, Feb 4, 2015, 6:13 PM kundan kumar  wrote:

> Hi,
>
> I am trying to execute my code on a yarn cluster
>
> The command which I am using is
>
> $SPARK_HOME/bin/spark-submit --class "EDDApp"
> target/scala-2.10/edd-application_2.10-1.0.jar --master yarn-cluster
> --num-executors 3 --driver-memory 6g --executor-memory 7g 
>
> But, I can see that this program is running only on the localhost.
>
> Its able to read the file from hdfs.
>
> I have tried this in standalone mode and it works fine.
>
> Please suggest where is it going wrong.
>
>
> Regards,
> Kundan
>


Re: Spark Job running on localhost on yarn cluster

2015-02-04 Thread Michael Albert
1) Parameters like "--num-executors" should come before the jar.  That is, you 
want something like$SPARK_HOME --num-executors 3 --driver-memory 6g 
--executor-memory 7g \--master yarn-cluster  --class EDDApp 
target/scala-2.10/eddjar \
That is, *your* parameters come after the jar, spark's parameters come *before* 
the jar.That's how spark knows which are which (at least that is my 
understanding).
2‚ Double check that in your code, when you create the SparkContext or the 
configuration object, that you don't set local there.(I don't recall the exact 
order of priority if the parameters disagree with the code).
Good luck!
-Mike

  From: kundan kumar 
 To: spark users  
 Sent: Wednesday, February 4, 2015 7:41 AM
 Subject: Spark Job running on localhost on yarn cluster
   
Hi, 
I am trying to execute my code on a yarn cluster
The command which I am using is 
$SPARK_HOME/bin/spark-submit --class "EDDApp" 
target/scala-2.10/edd-application_2.10-1.0.jar --master yarn-cluster 
--num-executors 3 --driver-memory 6g --executor-memory 7g 

But, I can see that this program is running only on the localhost.
Its able to read the file from hdfs.
I have tried this in standalone mode and it works fine.
Please suggest where is it going wrong.

Regards,Kundan

  

Re: Spark Job running on localhost on yarn cluster

2015-02-04 Thread Aniket Bhatnagar
Have you set master in SparkConf/SparkContext in your code? Driver logs
show in which mode the spark job is running. Double check if the logs
mention local or yarn-cluster.
Also, what's the error that you are getting?

On Wed, Feb 4, 2015, 6:13 PM kundan kumar  wrote:

> Hi,
>
> I am trying to execute my code on a yarn cluster
>
> The command which I am using is
>
> $SPARK_HOME/bin/spark-submit --class "EDDApp"
> target/scala-2.10/edd-application_2.10-1.0.jar --master yarn-cluster
> --num-executors 3 --driver-memory 6g --executor-memory 7g 
>
> But, I can see that this program is running only on the localhost.
>
> Its able to read the file from hdfs.
>
> I have tried this in standalone mode and it works fine.
>
> Please suggest where is it going wrong.
>
>
> Regards,
> Kundan
>