Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21589 I don't accept you assertions of what constitutes the majority and minority of Spark users or use cases or their relative importance. As a long-time maintainer of the Spark scheduler, it is also not my concern to define which Spark users are important or not, but rather to foster system internals and a public API that benefit all users. I already have pointed out with some specificity how exposing the scheduler's low-level accounting of the number of cores or executors that are available at some point can encourage anti-patterns and sub-optimal Job execution. Repartitioning based upon a snapshot of the number of cores available cluster-wide is clearly not the correct thing to do in many instances and use cases. Beyond concern for users, as a developer of Spark internals, I don't appreciate being pinned to particular implementation details by having them directly exposed to users. And I'll repeat, this JIRA and PR look to be defining the problem to fit a preconception of the solution. Even for the particular users and use cases targeted by this PR, I wouldn't expect that those users would embrace "I can't repartition based upon the scheduler's notion of the number of cores in the cluster at some point" as a more accurate statement of their problem than "My Spark Jobs don't use all of the CPU resources that I am entitled to use." Even if we were to stipulate that in a `repartition` call is inherently the only or best place to try to address that real user problem (and I far from convinced that this is the only or best approach), then I'd be far happier with extending the `repartition` API to include declarative goals than exposing to users only part of what is needed from Spark's internal to figure out what is the best repartitioning -- perhaps something along the lines of `repartition(MaximizeCPUs)` or other appropriate policy/goal enumerations. And spark packages are not irrelevant here. In fact, a large part of their motivation was to handle extensions that are not appropriate for all users or to prove out ideas and APIs that are not yet clearly appropriate for inclusion in Spark itself.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org