How to use Fair Scheduler Pools

2023-03-08 Thread 李杰
I have two questions to ask:

  I wrote a demo referring to the official 
website(https://spark.apache.org/docs/latest/job-scheduling.html), but it 
didn't meet my expectations. I don't know if there was a problem with my 
writing.I hope that when I use the following fairscheduler.xml, pool1 always 
performs tasks before pool2
   What is the relationship between "spark.scheduler.mode" and "schedulingMode" 
in fairscheduler.xml?

object MultiJobTest {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("test-pool").setMaster("local[1]")
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", 
"file:///D:/tmp/input/fairscheduler.xml")val sparkContext = new 
SparkContext(conf)
val data: RDD[String] = 
sparkContext.textFile("file:///D:/tmp/input/input.txt")
val rdd = data.flatMap(_.split(","))
  .map(x => (x(0), x(0)))newThread(() => {


  sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")
  rdd.foreachAsync(x => {
println("1==start==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))Thread.sleep(1)
println("1==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))
  })

}).start()newThread(() => {

  sparkContext.setLocalProperty("spark.scheduler.pool", "pool2")
  rdd.foreachAsync(x => {
println("2==start==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))Thread.sleep(1)
println("2==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))
  })
}).start()


TimeUnit.MINUTES.sleep(2)
sparkContext.stop()

  }
} 

fairscheduler.xml




  
FAIR
100
0
  
  
FAIR
1
0
  
 



input.txt


aa bb 





| |
李杰
|
|
leedd1...@163.com
|

How to use Fair Scheduler Pools

2023-03-08 Thread 李杰



I have two questions to ask:

  I wrote a demo referring to the official 
website(https://spark.apache.org/docs/latest/job-scheduling.html), but it 
didn't meet my expectations. I don't know if there was a problem with my 
writing.I hope that when I use the following fairscheduler.xml, pool1 always 
performs tasks before pool2
   What is the relationship between "spark.scheduler.mode" and "schedulingMode" 
in fairscheduler.xml?

object MultiJobTest {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("test-pool").setMaster("local[1]")
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", 
"file:///D:/tmp/input/fairscheduler.xml")val sparkContext = new 
SparkContext(conf)
val data: RDD[String] = 
sparkContext.textFile("file:///D:/tmp/input/input.txt")
val rdd = data.flatMap(_.split(","))
  .map(x => (x(0), x(0)))newThread(() => {


  sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")
  rdd.foreachAsync(x => {
println("1==start==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))Thread.sleep(1)
println("1==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))
  })

}).start()newThread(() => {

  sparkContext.setLocalProperty("spark.scheduler.pool", "pool2")
  rdd.foreachAsync(x => {
println("2==start==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))Thread.sleep(1)
println("2==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
Date()))
  })
}).start()


TimeUnit.MINUTES.sleep(2)
sparkContext.stop()

  }
} 

fairscheduler.xml




  
FAIR
100
0
  
  
FAIR
1
0
  
 



input.txt


aa bb 





| |
李杰
|
|
leedd1...@163.com
|

Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
Ah, that's why all the stuff about scheduler pools is under the
section "Scheduling
Within an Application
<https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application>".
 I am so used to talking to my coworkers about jobs in sense of
applications that I forgot your typical Spark application submits multiple
"jobs", each of which has multiple stages, etc.

So in my case I need to read up more closely about YARN queues
<https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>
since I want to share resources *across* applications. Thanks Mark!

On Wed, Apr 5, 2017 at 4:31 PM Mark Hamstra <m...@clearstorydata.com> wrote:

> `spark-submit` creates a new Application that will need to get resources
> from YARN. Spark's scheduler pools will determine how those resources are
> allocated among whatever Jobs run within the new Application.
>
> Spark's scheduler pools are only relevant when you are submitting multiple
> Jobs within a single Application (i.e., you are using the same SparkContext
> to launch multiple Jobs) and you have used SparkContext#setLocalProperty to
> set "spark.scheduler.pool" to something other than the default pool before
> a particular Job intended to use that pool is started via that SparkContext.
>
> On Wed, Apr 5, 2017 at 1:11 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
> Hmm, so when I submit an application with `spark-submit`, I need to
> guarantee it resources using YARN queues and not Spark's scheduler pools.
> Is that correct?
>
> When are Spark's scheduler pools relevant/useful in this context?
>
> On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> grrr... s/your/you're/
>
> On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> Your mixing up different levels of scheduling. Spark's fair scheduler
> pools are about scheduling Jobs, not Applications; whereas YARN queues with
> Spark are about scheduling Applications, not Jobs.
>
> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> I'm having trouble understanding the difference between Spark fair
> scheduler pools
> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
> and YARN queues
> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
> Do they conflict? Does one override the other?
>
> I posted a more detailed question about an issue I'm having with this on
> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>
> Nick
>
>
> --
> View this message in context: Spark fair scheduler pools vs. YARN queues
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>
>
>
>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Mark Hamstra
`spark-submit` creates a new Application that will need to get resources
from YARN. Spark's scheduler pools will determine how those resources are
allocated among whatever Jobs run within the new Application.

Spark's scheduler pools are only relevant when you are submitting multiple
Jobs within a single Application (i.e., you are using the same SparkContext
to launch multiple Jobs) and you have used SparkContext#setLocalProperty to
set "spark.scheduler.pool" to something other than the default pool before
a particular Job intended to use that pool is started via that SparkContext.

On Wed, Apr 5, 2017 at 1:11 PM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Hmm, so when I submit an application with `spark-submit`, I need to
> guarantee it resources using YARN queues and not Spark's scheduler pools.
> Is that correct?
>
> When are Spark's scheduler pools relevant/useful in this context?
>
> On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> grrr... s/your/you're/
>>
>> On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>> Your mixing up different levels of scheduling. Spark's fair scheduler
>> pools are about scheduling Jobs, not Applications; whereas YARN queues with
>> Spark are about scheduling Applications, not Jobs.
>>
>> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com
>> > wrote:
>>
>> I'm having trouble understanding the difference between Spark fair
>> scheduler pools
>> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
>> and YARN queues
>> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
>> Do they conflict? Does one override the other?
>>
>> I posted a more detailed question about an issue I'm having with this on
>> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>>
>> Nick
>>
>>
>> --
>> View this message in context: Spark fair scheduler pools vs. YARN queues
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>>
>>
>>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
Hmm, so when I submit an application with `spark-submit`, I need to
guarantee it resources using YARN queues and not Spark's scheduler pools.
Is that correct?

When are Spark's scheduler pools relevant/useful in this context?

On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra <m...@clearstorydata.com> wrote:

> grrr... s/your/you're/
>
> On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> Your mixing up different levels of scheduling. Spark's fair scheduler
> pools are about scheduling Jobs, not Applications; whereas YARN queues with
> Spark are about scheduling Applications, not Jobs.
>
> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> I'm having trouble understanding the difference between Spark fair
> scheduler pools
> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
> and YARN queues
> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
> Do they conflict? Does one override the other?
>
> I posted a more detailed question about an issue I'm having with this on
> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>
> Nick
>
>
> --
> View this message in context: Spark fair scheduler pools vs. YARN queues
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>
>
>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Mark Hamstra
grrr... s/your/you're/

On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> Your mixing up different levels of scheduling. Spark's fair scheduler
> pools are about scheduling Jobs, not Applications; whereas YARN queues with
> Spark are about scheduling Applications, not Jobs.
>
> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
> wrote:
>
>> I'm having trouble understanding the difference between Spark fair
>> scheduler pools
>> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
>> and YARN queues
>> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
>> Do they conflict? Does one override the other?
>>
>> I posted a more detailed question about an issue I'm having with this on
>> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>>
>> Nick
>>
>>
>> --
>> View this message in context: Spark fair scheduler pools vs. YARN queues
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Mark Hamstra
Your mixing up different levels of scheduling. Spark's fair scheduler pools
are about scheduling Jobs, not Applications; whereas YARN queues with Spark
are about scheduling Applications, not Jobs.

On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
wrote:

> I'm having trouble understanding the difference between Spark fair
> scheduler pools
> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
> and YARN queues
> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
> Do they conflict? Does one override the other?
>
> I posted a more detailed question about an issue I'm having with this on
> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>
> Nick
>
>
> ------
> View this message in context: Spark fair scheduler pools vs. YARN queues
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>


Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nick Chammas
I'm having trouble understanding the difference between Spark fair
scheduler pools
<https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
and YARN queues
<https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
Do they conflict? Does one override the other?

I posted a more detailed question about an issue I'm having with this on
Stack Overflow: http://stackoverflow.com/q/43239921/877069

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Fair Scheduler Pools with Kafka Streaming

2016-02-16 Thread Sebastian Piu
Yes it is related to concurrentJobs, so you need to increase that. Salt
that will mean that if you get overlapping batches then those will be
executed in parallel too

On Tue, 16 Feb 2016, 18:33 p pathiyil <pathi...@gmail.com> wrote:

> Hi,
>
> I am trying to use Fair Scheduler Pools with Kafka Streaming. I am
> assigning each Kafka partition to its own pool. The attempt is to give each
> partition an equal share of compute time irrespective of the number of
> messages in each time window for each partition.
>
> However, I do not see fair sharing to be the behavior. When running in
> local mode, with some artificial delay in the processing of one of the
> partitions, I see that till all the messages of that partition is
> processed, the other partitions are not being picked up. The total delay in
> processing of all messages in that partition from one time window is bigger
> than the time window itself and hence there are messages available from
> other partitions to be processed while the 'slow' partition is being
> processed.
>
> How is the fairness calculated in this type of scheduling ? Is it in some
> way related to setting for the  number of concurrentJobs ?
>
> Thanks.
>


Fair Scheduler Pools with Kafka Streaming

2016-02-16 Thread p pathiyil
Hi,

I am trying to use Fair Scheduler Pools with Kafka Streaming. I am
assigning each Kafka partition to its own pool. The attempt is to give each
partition an equal share of compute time irrespective of the number of
messages in each time window for each partition.

However, I do not see fair sharing to be the behavior. When running in
local mode, with some artificial delay in the processing of one of the
partitions, I see that till all the messages of that partition is
processed, the other partitions are not being picked up. The total delay in
processing of all messages in that partition from one time window is bigger
than the time window itself and hence there are messages available from
other partitions to be processed while the 'slow' partition is being
processed.

How is the fairness calculated in this type of scheduling ? Is it in some
way related to setting for the  number of concurrentJobs ?

Thanks.


Fair Scheduler Pools

2015-02-24 Thread pnpritchard
Hi,

I am trying to use the fair scheduler pools
(http://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools)
to schedule two jobs at the same time.

In my simple example, I have configured spark in local mode with 2 cores
(local[2]). I have also configured two pools in fairscheduler.xml that
each have minShares = 1. With this configuration, I would assume that each
all jobs in each pool will get assigned to one core. However, after running
some simple experiments, and looking at the spark UI, I doesn't seem like
this is the case.

Is my understanding incorrect? If not, am I configuring things wrong? I have
copied my code and xml below.

Thanks,
Nick


code:

val conf = new SparkConf()
  .setMaster(local[2])
  .setAppName(Test)
  .set(spark.scheduler.mode, FAIR)
  .set(spark.scheduler.allocation.file, /etc/tercel/fairscheduler.xml)
val sc = new SparkContext(conf)

val input = sc.parallelize(1 to 10)

new Thread(new Runnable() {
  override def run(): Unit = {
sc.setLocalProperty(spark.scheduler.pool, pool1)
val output1 = input.map { x = Thread.sleep(1000); x }
output1.count()
  }
}).start()

new Thread(new Runnable() {
  override def run(): Unit = {
sc.setLocalProperty(spark.scheduler.pool, pool2)
val output2 = input.map { x = Thread.sleep(1000); x }
output2.count()
  }
}).start()


fairscheduler.xml:

?xml version=1.0?
allocations
  pool name=pool1
schedulingModeFAIR/schedulingMode
weight1/weight
minShare1/minShare
  /pool
  pool name=pool2
schedulingModeFAIR/schedulingMode
weight1/weight
minShare1/minShare
  /pool
/allocations




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fair-Scheduler-Pools-tp21791.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org