Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
a small hint would be very helpful .

On Wed, Feb 14, 2018 at 5:17 PM, akshay naidu 
wrote:

> Hello Siva,
> Thanks for your reply.
>
> Actually i'm trying to generate online reports for my clients. For this I
> want the jobs should be executed faster without putting any job on QUEUE
> irrespective of the number of jobs different clients are executing from
> different locations.
> currently , a job processing 17GB of data takes more than 20mins to
> execute. also only 6 jobs run simultaneously and the remaining one are in
> WAITING stage.
>
> Thanks
>
> On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli 
> wrote:
>
>>
>> Hello Akshay,
>>
>> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each
>> Instance => 30 cores in total
>> Do you have any other pools confuted ? Running 8 jobs should be triggered
>> in parallel with the number of cores you have.
>>
>> For your long running job, did you have a chance to look at Tasks thats
>> being triggered.
>>
>> I would recommend slow running job to be configured in a separate pool.
>>
>> Regards
>> Shiv
>>
>> On Feb 14, 2018, at 5:44 AM, akshay naidu 
>> wrote:
>>
>> 
>> **
>> yarn-site.xml
>>
>>
>>  
>> yarn.scheduler.fair.preemption.cluster-utilization-
>> threshold
>> 0.8
>>   
>>
>> 
>> yarn.scheduler.minimum-allocation-mb
>> 3584
>> 
>>
>> 
>> yarn.scheduler.maximum-allocation-mb
>> 10752
>> 
>>
>> 
>> yarn.nodemanager.resource.memory-mb
>> 10752
>>
>> 
>> **
>> spark-defaults.conf
>>
>> spark.master   yarn
>> spark.driver.memory9g
>> spark.executor.memory  1024m
>> spark.yarn.executor.memoryOverhead 1024m
>> spark.eventLog.enabled  true
>> spark.eventLog.dir hdfs://tech-master:54310/spark-logs
>>
>> spark.history.providerorg.apache.spark.deploy.histor
>> y.FsHistoryProvider
>> spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs
>> spark.history.fs.update.interval  10s
>> spark.history.ui.port 18080
>>
>> spark.ui.enabledtrue
>> spark.ui.port   4040
>> spark.ui.killEnabledtrue
>> spark.ui.retainedDeadExecutors  100
>>
>> spark.scheduler.modeFAIR
>> spark.scheduler.allocation.file /usr/local/spark/current/conf/
>> fairscheduler.xml
>>
>> #spark.submit.deployMode cluster
>> spark.default.parallelism30
>>
>> SPARK_WORKER_MEMORY 10g
>> SPARK_WORKER_INSTANCES 1
>> SPARK_WORKER_CORES 5
>>
>> SPARK_DRIVER_MEMORY 9g
>> SPARK_DRIVER_CORES 5
>>
>> SPARK_MASTER_IP Tech-master
>> SPARK_MASTER_PORT 7077
>>
>> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu 
>> wrote:
>>
>>> Hello,
>>> I'm try to run multiple spark jobs on cluster running in yarn.
>>> Master is 24GB server with 6 Slaves of 12GB
>>>
>>> fairscheduler.xml settings are -
>>> 
>>> FAIR
>>> 10
>>> 2
>>> 
>>>
>>> I am running 8 jobs simultaneously , jobs are running parallelly but not
>>> all.
>>> at a time only 7 of then runs simultaneously while the 8th one is in
>>> queue WAITING for a job to stop.
>>>
>>> also, out of the 7 running jobs, 4 runs comparatively much faster than
>>> remaining three (maybe resources are not distributed properly) .
>>>
>>> I want to run n number of jobs at a time and make them run faster ,
>>> Right now, one job is taking more than three minutes while processing a max
>>> of 1GB data .
>>>
>>> Kindly assist me. what am I missing.
>>>
>>> Thanks.
>>>
>>
>>
>>
>


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
Hello Siva,
Thanks for your reply.

Actually i'm trying to generate online reports for my clients. For this I
want the jobs should be executed faster without putting any job on QUEUE
irrespective of the number of jobs different clients are executing from
different locations.
currently , a job processing 17GB of data takes more than 20mins to
execute. also only 6 jobs run simultaneously and the remaining one are in
WAITING stage.

Thanks

On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli 
wrote:

>
> Hello Akshay,
>
> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each
> Instance => 30 cores in total
> Do you have any other pools confuted ? Running 8 jobs should be triggered
> in parallel with the number of cores you have.
>
> For your long running job, did you have a chance to look at Tasks thats
> being triggered.
>
> I would recommend slow running job to be configured in a separate pool.
>
> Regards
> Shiv
>
> On Feb 14, 2018, at 5:44 AM, akshay naidu  wrote:
>
> 
> **
> yarn-site.xml
>
>
>  
> yarn.scheduler.fair.preemption.cluster-
> utilization-threshold
> 0.8
>   
>
> 
> yarn.scheduler.minimum-allocation-mb
> 3584
> 
>
> 
> yarn.scheduler.maximum-allocation-mb
> 10752
> 
>
> 
> yarn.nodemanager.resource.memory-mb
> 10752
>
> 
> **
> spark-defaults.conf
>
> spark.master   yarn
> spark.driver.memory9g
> spark.executor.memory  1024m
> spark.yarn.executor.memoryOverhead 1024m
> spark.eventLog.enabled  true
> spark.eventLog.dir hdfs://tech-master:54310/spark-logs
>
> spark.history.providerorg.apache.spark.deploy.
> history.FsHistoryProvider
> spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs
> spark.history.fs.update.interval  10s
> spark.history.ui.port 18080
>
> spark.ui.enabledtrue
> spark.ui.port   4040
> spark.ui.killEnabledtrue
> spark.ui.retainedDeadExecutors  100
>
> spark.scheduler.modeFAIR
> spark.scheduler.allocation.file /usr/local/spark/current/conf/
> fairscheduler.xml
>
> #spark.submit.deployMode cluster
> spark.default.parallelism30
>
> SPARK_WORKER_MEMORY 10g
> SPARK_WORKER_INSTANCES 1
> SPARK_WORKER_CORES 5
>
> SPARK_DRIVER_MEMORY 9g
> SPARK_DRIVER_CORES 5
>
> SPARK_MASTER_IP Tech-master
> SPARK_MASTER_PORT 7077
>
> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu 
> wrote:
>
>> Hello,
>> I'm try to run multiple spark jobs on cluster running in yarn.
>> Master is 24GB server with 6 Slaves of 12GB
>>
>> fairscheduler.xml settings are -
>> 
>> FAIR
>> 10
>> 2
>> 
>>
>> I am running 8 jobs simultaneously , jobs are running parallelly but not
>> all.
>> at a time only 7 of then runs simultaneously while the 8th one is in
>> queue WAITING for a job to stop.
>>
>> also, out of the 7 running jobs, 4 runs comparatively much faster than
>> remaining three (maybe resources are not distributed properly) .
>>
>> I want to run n number of jobs at a time and make them run faster , Right
>> now, one job is taking more than three minutes while processing a max of
>> 1GB data .
>>
>> Kindly assist me. what am I missing.
>>
>> Thanks.
>>
>
>
>


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
**
yarn-site.xml


 

yarn.scheduler.fair.preemption.cluster-utilization-threshold
0.8
  


yarn.scheduler.minimum-allocation-mb
3584



yarn.scheduler.maximum-allocation-mb
10752



yarn.nodemanager.resource.memory-mb
10752

**
spark-defaults.conf

spark.master   yarn
spark.driver.memory9g
spark.executor.memory  1024m
spark.yarn.executor.memoryOverhead 1024m
spark.eventLog.enabled  true
spark.eventLog.dir hdfs://tech-master:54310/spark-logs

spark.history.provider
org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs
spark.history.fs.update.interval  10s
spark.history.ui.port 18080

spark.ui.enabledtrue
spark.ui.port   4040
spark.ui.killEnabledtrue
spark.ui.retainedDeadExecutors  100

spark.scheduler.modeFAIR
spark.scheduler.allocation.file
/usr/local/spark/current/conf/fairscheduler.xml

#spark.submit.deployMode cluster
spark.default.parallelism30

SPARK_WORKER_MEMORY 10g
SPARK_WORKER_INSTANCES 1
SPARK_WORKER_CORES 5

SPARK_DRIVER_MEMORY 9g
SPARK_DRIVER_CORES 5

SPARK_MASTER_IP Tech-master
SPARK_MASTER_PORT 7077

On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu 
wrote:

> Hello,
> I'm try to run multiple spark jobs on cluster running in yarn.
> Master is 24GB server with 6 Slaves of 12GB
>
> fairscheduler.xml settings are -
> 
> FAIR
> 10
> 2
> 
>
> I am running 8 jobs simultaneously , jobs are running parallelly but not
> all.
> at a time only 7 of then runs simultaneously while the 8th one is in queue
> WAITING for a job to stop.
>
> also, out of the 7 running jobs, 4 runs comparatively much faster than
> remaining three (maybe resources are not distributed properly) .
>
> I want to run n number of jobs at a time and make them run faster , Right
> now, one job is taking more than three minutes while processing a max of
> 1GB data .
>
> Kindly assist me. what am I missing.
>
> Thanks.
>


Re: Run Multiple Spark jobs. Reduce Execution time.

2018-02-14 Thread akshay naidu
On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu 
wrote:

> Hello,
> I'm try to run multiple spark jobs on cluster running in yarn.
> Master is 24GB server with 6 Slaves of 12GB
>
> fairscheduler.xml settings are -
> 
> FAIR
> 10
> 2
> 
>
> I am running 8 jobs simultaneously , jobs are running parallelly but not
> all.
> at a time only 7 of then runs simultaneously while the 8th one is in queue
> WAITING for a job to stop.
>
> also, out of the 7 running jobs, 4 runs comparatively much faster than
> remaining three (maybe resources are not distributed properly) .
>
> I want to run n number of jobs at a time and make them run faster , Right
> now, one job is taking more than three minutes while processing a max of
> 1GB data .
>
> Kindly assist me. what am I missing.
>
> Thanks.
>