I can¹t speak to Mesos solutions, but for YARN you can define queues in
which to run your jobs, and you can customize the amount of resources the
queue consumes.  When deploying your Spark job, you can specify the ‹queue
<queue_name> option to schedule the job to a particular queue.  Here are
some links for reference:

http://hadoop.apache.org/docs/r2.5.1/hadoop-yarn/hadoop-yarn-site/CapacitySc
heduler.html
http://lucene.472066.n3.nabble.com/Capacity-Scheduler-on-YARN-td4081242.html



From:  Luis Guerra <luispelay...@gmail.com>
Date:  Tuesday, January 13, 2015 at 3:19 AM
To:  "Buttler, David" <buttl...@llnl.gov>, user <user@spark.apache.org>
Subject:  Re: Spark executors resources. Blocking?

Thanks for your answer David,

It is as I thought then. When you write that there could be some approaches
to solve this using Yarn or Mesos, can you give any idea about this? Or
better yet, is there any site with documentation about this issue?
Currently, we are launching our jobs using Yarn, but still we do not know
how to properly schedule our jobs to have the highest utilization of our
cluster.

Best

On Tue, Jan 13, 2015 at 2:12 AM, Buttler, David <buttl...@llnl.gov> wrote:
> Spark has a built-in cluster manager (The spark stand-alone cluster), but it
> is not designed for multi-tenancy ­either multiple people using the system, or
> multiple tasks sharing resources.  It is a first in, first out queue of tasks
> where tasks will block until the previous tasks are finished (as you
> described).  If you want to have higher utilization of your cluster, then you
> should use either Yarn or Mesos to schedule the system.  The same issues will
> come up, but they have a much broader range of approaches that you can take to
> solve the problem.
>  
> Dave
>  
> From: Luis Guerra [mailto:luispelay...@gmail.com]
> Sent: Monday, January 12, 2015 8:36 AM
> To: user
> Subject: Spark executors resources. Blocking?
> 
>  
> 
> Hello all,
> 
>  
> 
> I have a naive question regarding how spark uses the executors in a cluster of
> machines. Imagine the scenario in which I do not know the input size of my
> data in execution A, so I set Spark to use 20 (out of my 25 nodes, for
> instance). At the same time, I also launch a second execution B, setting Spark
> to use 10 nodes for this.
> 
>  
> 
> Assuming a huge input size for execution A, which implies an execution time of
> 30 minutes for example (using all the resources), and a constant execution
> time for B of 10 minutes, then both executions will last for 40 minutes (I
> assume that B cannot be launched until 10 resources are completely available,
> when A finishes).
> 
>  
> 
> Now, assuming a very small input size for execution A running for 5 minutes in
> only 2 of the 20 planned resources, I would like execution B to be launched at
> that time, consuming both executions only 10 minutes (and 12 resources).
> However, as execution A has set Spark to use 20 resources, execution B has to
> wait until A has finished, so the total execution time lasts for 15 minutes.
> 
>  
> 
> Is this right? If so, how can I solve this kind of scenarios? If I am wrong,
> what would be the correct interpretation for this?
> 
>  
> 
> Thanks in advance,
> 
>  
> 
> Best



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to