Happy Diwali everyone!!!

2019-10-27 Thread Xiao Li
Happy Diwali everyone!!!

Xiao


Re: Spark - configuration setting doesn't work

2019-10-27 Thread hemant singh
You should add the configurations while creating the session, I don’t think
you can override it once the session is created. Few are though.

Thanks,
Hemant

On Sun, 27 Oct 2019 at 11:02 AM, Chetan Khatri 
wrote:

> Could someone please help me.
>
> On Thu, Oct 17, 2019 at 7:29 PM Chetan Khatri 
> wrote:
>
>> Hi Users,
>>
>> I am setting spark configuration in below way;
>>
>> val spark = SparkSession.builder().appName(APP_NAME).getOrCreate()
>>
>> spark.conf.set("spark.speculation", "false")
>> spark.conf.set("spark.broadcast.compress", "true")
>> spark.conf.set("spark.sql.broadcastTimeout", "36000")
>> spark.conf.set("spark.network.timeout", "2500s")
>> spark.conf.set("spark.serializer", 
>> "org.apache.spark.serializer.KryoSerializer")
>> spark.conf.set("spark.driver.memory", "10g")
>> spark.conf.set("spark.executor.memory", "10g")
>>
>> import spark.implicits._
>>
>>
>> and submitting spark job with spark - submit. but none of the above 
>> configuration is
>>
>> getting reflected to the job, I have checked at Spark-UI.
>>
>> I know setting up like this while creation of spark object, it's working 
>> well.
>>
>>
>> val spark = SparkSession.builder().appName(APP_NAME)
>>   .config("spark.network.timeout", "1500s")
>>   .config("spark.broadcast.compress", "true")
>>   .config("spark.sql.broadcastTimeout", "36000")
>>   .getOrCreate()
>>
>> import spark.implicits._
>>
>>
>> Can someone please throw light?
>>
>>


Re: Spark Cluster over yarn cluster monitoring

2019-10-27 Thread Jörn Franke
Use yarn queues:

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

> Am 27.10.2019 um 06:41 schrieb Chetan Khatri :
> 
> 
> Could someone please help me to understand better..
> 
>> On Thu, Oct 17, 2019 at 7:41 PM Chetan Khatri  
>> wrote:
>> Hi Users,
>> 
>> I do submit X number of jobs with Airflow to Yarn as a part of workflow for 
>> Y customer. I could potentially run workflow for customer Z but I need to 
>> check that how much resources are available over the cluster so jobs for 
>> next customer should start.
>> 
>> Could you please tell what is the best way to handle this. Currently, I am 
>> just checking availableMB > 100 then trigger next Airflow DAG over Yarn.
>> 
>> GET http://rm-http-address:port/ws/v1/cluster/metrics
>> Thanks.