Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21589 @MaxGekk The example you cites is literally one of a handful of usages which is not easily overridden - and is prefixed with a 'HACK ALERT' ! A few others are in mllib, typically for reading schema. I will reiterate the solutions available to users currently: * Rely on `defaultParallelism` - this gives the expected result, unless explicitly overridden by user. * If you need fine grained information about executors, use spark listener (it is trivial to keep a count with `onExecutorAdded`/`onExecutorRemoved`). * If you simply want a current value without own listener - use REST api to query for current executors. Having said this, I will caution against this approach if you are concerned about performance. `defaultParallelism` exists to give a default when user does not explicitly override when creating an `RDD` : and reflects the current number of executors. Particularly when dynamic resource allocation is enabled, this value is not optimal : spark will acquire or release resources based on pending tasks. Using available cluster resources (from cluster manager - not spark) as a way to model parallelism would be a better approach : externalize your config's and populate based on resources available to application (in your example: difference between test/staging/production).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org