GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21589
[SPARK-24591][CORE] Number of cores and executors in the cluster
## What changes were proposed in this pull request?
In the PR, I propose to extend `SparkContext` by:
1. `def numCores: Int` returns total number of CPU cores of all executors
registered in the cluster at the moment. Main use case for that is using it in
_repartition()_ and _coalesce()_.
2. `def numExecutors: Int` returns total number of executors registered in
the cluster at the moment. Some jobs, e.g., local node ML training, use a lot
of parallelism. It's a common practice to aim to distribute such jobs such that
there is one partition for each executor.
## How was this patch tested?
- R API was tested manually from `sparkR`
- Added tests fro PySpark and `JavaSparkContext` that test number of cores
and executors in `local` mode.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 num-cores-and-executors
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21589.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21589
----
commit 9d44d7d4d86e8549cc4e524a7ea3d818b41084f2
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-16T18:14:42Z
Methods returns total number of cores and executors in the cluster
commit c6b354c466677c1101b30fc1b25ddc5750c8eaf6
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-16T18:19:09Z
Update Java's Spark Context
commit 54f04369c0f3329e8c27ad405a350ee20b788b21
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-16T18:57:08Z
Tests for number of cores and executors in the local mode
commit 4d645829c8d338451be81c4554cc1257b459f6a6
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-16T20:09:35Z
Adding coresCount and executorsCount to PySpark
commit 79633d9a3e7aebf40ee8940e8fcf00d43dc22ed7
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-18T13:08:54Z
Improving comments
commit d7e94e10794964022d3dc98671b86f02af80d2e8
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-18T13:31:48Z
Renaming of the methods
commit 9be566f1ed3e066de7e3d3ad557756d22fc22a73
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-18T18:41:06Z
New methods for SparkR
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]