GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21589

    [SPARK-24591][CORE] Number of cores and executors in the cluster

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to extend `SparkContext` by:
    
    1. `def numCores: Int` returns total number of CPU cores of all executors 
registered in the cluster at the moment. Main use case for that is using it in 
_repartition()_ and _coalesce()_.
    
    2. `def numExecutors: Int` returns total number of executors registered in 
the cluster at the moment. Some jobs, e.g., local node ML training, use a lot 
of parallelism. It's a common practice to aim to distribute such jobs such that 
there is one partition for each executor. 
    
    ## How was this patch tested?
    
    - R API was tested manually from `sparkR`
    - Added tests fro PySpark and `JavaSparkContext` that test number of cores 
and executors in `local` mode.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 num-cores-and-executors

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21589.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21589
    
----
commit 9d44d7d4d86e8549cc4e524a7ea3d818b41084f2
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-16T18:14:42Z

    Methods returns total number of cores and executors in the cluster

commit c6b354c466677c1101b30fc1b25ddc5750c8eaf6
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-16T18:19:09Z

    Update Java's Spark Context

commit 54f04369c0f3329e8c27ad405a350ee20b788b21
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-16T18:57:08Z

    Tests for number of cores and executors in the local mode

commit 4d645829c8d338451be81c4554cc1257b459f6a6
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-16T20:09:35Z

    Adding coresCount and executorsCount to PySpark

commit 79633d9a3e7aebf40ee8940e8fcf00d43dc22ed7
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-18T13:08:54Z

    Improving comments

commit d7e94e10794964022d3dc98671b86f02af80d2e8
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-18T13:31:48Z

    Renaming of the methods

commit 9be566f1ed3e066de7e3d3ad557756d22fc22a73
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-18T18:41:06Z

    New methods for SparkR

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to