GitHub user erenavsarogullari opened a pull request:

    https://github.com/apache/spark/pull/15326

    [SPARK-17759] [CORE] SchedulableBuilder should avoid to create duplicate 
fair scheduler-pools

    ## What changes were proposed in this pull request?
    If `spark.scheduler.allocation.file` has duplicate pools, all of them are 
created when `SparkContext` is initialized but just one of them is used and the 
other ones look **redundant**. This causes redundant pool creation and needs to 
be fixed. 
    
    **Code to Reproduce** :
    
    ```
    val conf = new 
SparkConf().setAppName("spark-fairscheduler").setMaster("local")
    conf.set("spark.scheduler.mode", "FAIR")
    conf.set("spark.scheduler.allocation.file", 
"src/main/resources/fairscheduler-duplicate-pools.xml")
    val sc = new SparkContext(conf)
    ```
    
    **fairscheduler-duplicate-pools.xml** :
    
    The following sample just shows two **default** and **duplicate_pool1** but 
this also needs to be thought for **N** default and/or other duplicate pools.
    
    ```
    <allocations>
        <pool name="default">
            <minShare>0</minShare>
            <weight>1</weight>
            <schedulingMode>FAIR</schedulingMode>
        </pool>
        <pool name="default">
            <minShare>0</minShare>
            <weight>1</weight>
            <schedulingMode>FAIR</schedulingMode>
        </pool>
        <pool name="duplicate_pool1">
            <minShare>1</minShare>
            <weight>1</weight>
            <schedulingMode>FAIR</schedulingMode>
        </pool>
        <pool name="duplicate_pool1">
            <minShare>2</minShare>
            <weight>2</weight>
            <schedulingMode>FAIR</schedulingMode>
        </pool>
    </allocations>
    ```
    
    **Debug Screenshot** :
    Screenshots show 
`Pool.schedulableQueue(ConcurrentLinkedQueue[Schedulable])` has **4** pools as 
    
    > default, default, duplicate_pool1 and duplicate_pool1
    
     but `Pool.schedulableNameToSchedulable(ConcurrentHashMap[String, 
Schedulable])` has 
    
    > default and duplicate_pool1
    
    due to pool name as key so one of **default** and **duplicate_pool1** look 
**redundant** and live in `Pool.schedulableQueue`.
    
    <img width="530" alt="duplicate_pools" 
src="https://cloud.githubusercontent.com/assets/1437738/19020475/994fbcfc-88a1-11e6-9d11-102023461d3d.png";>
    <img width="541" alt="duplicate_pools2" 
src="https://cloud.githubusercontent.com/assets/1437738/19020476/9969e4d8-88a1-11e6-9732-1e91f942c570.png";>
    
    
    ## How was this patch tested?
    
    Added new Unit Test case.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/erenavsarogullari/spark SPARK-17759

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15326
    
----
commit 5512d5a7f5027725b9dd15ae895fc3f71885933f
Author: erenavsarogullari <[email protected]>
Date:   2016-10-02T12:04:05Z

    SchedulableBuilder should avoid to create duplicate fair scheduler-pools.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to