[jira] [Resolved] (SPARK-25678) SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)

Sean Owen (JIRA) Wed, 21 Nov 2018 07:29:14 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-25678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-25678.
-------------------------------
    Resolution: Won't Fix

If there is more work to be done to make resource managers pluggable, I'd put 
that under SPARK-19700. If it's about merging support for this cluster manager, 
no I am pretty certain that would not happen in Spark.

> SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)
> ------------------------------------------------------------------------
>
>                 Key: SPARK-25678
>                 URL: https://issues.apache.org/jira/browse/SPARK-25678
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler
>    Affects Versions: 3.0.0
>            Reporter: Utkarsh Maheshwari
>            Priority: Major
>
> I sent an email on the dev mailing list but got no response, hence filing a 
> JIRA ticket.
>  
> PBS (Portable Batch System) Professional is an open sourced workload 
> management system for HPC clusters. Many organizations using PBS for managing 
> their cluster also use Spark for Big Data but they are forced to divide the 
> cluster into Spark cluster and PBS cluster either physically dividing the 
> cluster nodes into two groups or starting Spark Standalone cluster manager's 
> Master and Slaves as PBS jobs, leading to underutilization of resources.
>  
>  I am trying to add support in Spark to use PBS as a pluggable cluster 
> manager. Going through the Spark codebase and looking at Mesos and Kubernetes 
> integration, I found that we can get this working as follows:
>  
>  - Extend `ExternalClusterManager`.
>  - Extend `CoarseGrainedSchedulerBackend`
>    - This class can start `Executors` as PBS jobs.
>    - The initial number of `Executors` are started `onStart`.
>    - More `Executors` can be started as and when required using 
> `doRequestTotalExecutors`.
>    - `Executors` can be killed using `doKillExecutors`.
>  - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy 
> mode.
>    - This extended class can submit the Spark application again as a PBS job 
> which with deploy mode = client, so that the application driver is started on 
> a node in the cluster.
>  
>  I have a couple of questions:
>  - Does this seem like a good idea to do this or should we look at other 
> options?
>  - What are the expectations from the initial prototype?
>  - If this works, would Spark maintainers look forward to merging this or 
> would they want it to be maintained as a fork?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25678) SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)

Reply via email to