[ https://issues.apache.org/jira/browse/SPARK-25678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-25678. ------------------------------- Resolution: Won't Fix If there is more work to be done to make resource managers pluggable, I'd put that under SPARK-19700. If it's about merging support for this cluster manager, no I am pretty certain that would not happen in Spark. > SPIP: Adding support in Spark for HPC cluster manager (PBS Professional) > ------------------------------------------------------------------------ > > Key: SPARK-25678 > URL: https://issues.apache.org/jira/browse/SPARK-25678 > Project: Spark > Issue Type: New Feature > Components: Scheduler > Affects Versions: 3.0.0 > Reporter: Utkarsh Maheshwari > Priority: Major > > I sent an email on the dev mailing list but got no response, hence filing a > JIRA ticket. > > PBS (Portable Batch System) Professional is an open sourced workload > management system for HPC clusters. Many organizations using PBS for managing > their cluster also use Spark for Big Data but they are forced to divide the > cluster into Spark cluster and PBS cluster either physically dividing the > cluster nodes into two groups or starting Spark Standalone cluster manager's > Master and Slaves as PBS jobs, leading to underutilization of resources. > > I am trying to add support in Spark to use PBS as a pluggable cluster > manager. Going through the Spark codebase and looking at Mesos and Kubernetes > integration, I found that we can get this working as follows: > > - Extend `ExternalClusterManager`. > - Extend `CoarseGrainedSchedulerBackend` > - This class can start `Executors` as PBS jobs. > - The initial number of `Executors` are started `onStart`. > - More `Executors` can be started as and when required using > `doRequestTotalExecutors`. > - `Executors` can be killed using `doKillExecutors`. > - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy > mode. > - This extended class can submit the Spark application again as a PBS job > which with deploy mode = client, so that the application driver is started on > a node in the cluster. > > I have a couple of questions: > - Does this seem like a good idea to do this or should we look at other > options? > - What are the expectations from the initial prototype? > - If this works, would Spark maintainers look forward to merging this or > would they want it to be maintained as a fork? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org