[GitHub] spark issue #19354: [SPARK-20992][Scheduler] Add links in documentation to N...
Github user barnardb commented on the issue: https://github.com/apache/spark/pull/19354 I've updated the PR to leave the Kubernetes links as they are, and just to add a link to the Nomad integration project with wording indicating that it's not supported by the Spark project. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19354: [SPARK-20992][Scheduler] Add links in documentation to N...
Github user barnardb commented on the issue: https://github.com/apache/spark/pull/19354 I totally understand the reluctance to have non-ASF projects in a list headed by "The system currently supportsâ¦". Looking at the [Powered By](https://spark.apache.org/powered-by.html) page, it doesn't look like the best way to help users find third-party cluster integrations. The [Spark Packages](https://spark-packages.org) site @srowen mentions seems a slightly better fit, but I don't think it would occur to me that I could look there for cluster manager integrations. Would it be appropriate to remove the Kubernetes and Nomad items from the list of supported cluster managers, and instead follow the list with a short line that links to these integrations but makes their unsupported status clear? Something like: > Third-party projects (not supported by the Spark project) exist to add support for [Kubernetes](https://github.com/apache-spark-on-k8s/) and [Nomad](https://github.com/hashicorp/nomad-spark) as cluster managers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18209: [SPARK-20992][Scheduler] Add support for Nomad as...
Github user barnardb closed the pull request at: https://github.com/apache/spark/pull/18209 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19354: [SPARK-20992][Scheduler] Add links in documentati...
GitHub user barnardb opened a pull request: https://github.com/apache/spark/pull/19354 [SPARK-20992][Scheduler] Add links in documentation to Nomad integration. ## What changes were proposed in this pull request? Adds links to the fork that provides integration with Nomad, in the same places the k8s integration is linked to. ## How was this patch tested? I clicked on the links to make sure they're correct ;) You can merge this pull request into a Git repository by running: $ git pull https://github.com/hashicorp/nomad-spark link-to-nomad-integration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19354.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19354 commit 3369b8d75486f8045ba5e409feb6f032623f37cb Author: Ben Barnard <barna...@gmail.com> Date: 2017-09-11T10:39:53Z Add links in documentation to Nomad integration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18209: [SPARK-20992][Scheduler] Add support for Nomad as...
GitHub user barnardb opened a pull request: https://github.com/apache/spark/pull/18209 [SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend ## What changes were proposed in this pull request? Adds support for [Nomad](https://github.com/hashicorp/nomad) as a scheduler backend. Nomad is a cluster manager designed for both long lived services and short lived batch processing workloads. The integration supports client and cluster mode, dynamic allocation (increasing only), has basic support for python and R applications, and works with applications packaged either as JARs or as docker images. Documentation is in [docs/running-on-nomad.md](https://github.com/barnardb/spark/blob/nomad/docs/running-on-nomad.md). This will be [presented at Spark Summit 2017](https://spark-summit.org/2017/events/homologous-apache-spark-clusters-using-nomad/). A build of the pull request with Nomad support is at available [here](https://www.dropbox.com/s/llcv388yl5hweje/spark-2.3.0-SNAPSHOT-bin-nomad.tgz?dl=0). Feedback would be much appreciated. ## How was this patch tested? This patch was tested with Integration and manual tests, and a load test was performed to ensure it doesn't have worse performance than the YARN integration. The feature was developed and tested against Nomad 0.5.6 (current stable version) on Spark 2.1.0, rebased to 2.1.1 and retested, and finally rebased to master and retested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/barnardb/spark nomad Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18209 commit c762194188e64cccff8a9758885b45f9d395cced Author: Ben Barnard <barna...@gmail.com> Date: 2017-06-06T01:19:35Z Add support for Nomad as a scheduler backend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17551: [SPARK-20242][Web UI] Add spark.ui.stopDelay
Github user barnardb commented on the issue: https://github.com/apache/spark/pull/17551 > It's still running your code, right? Why can't you add a configuration to your own code that tells it to wait some time before shutting down the SparkContext? We're trying to support arbitrary jobs running on the cluster, to make it easy for users to inspect the jobs that they run there. This was a quick way to achieve that, but I agree with the other commenters that this quite hacky, and that the history server would be a nicer solution. Our problem with the history server right now is that while the current driver-side `EventLoggingListener` + history-server-side `FsHistoryProvider` implementations are great for environments with HDFS, they're much less convenient in a cluster without a distributed filesystem. I'd propose that I close this PR, and work on an RPC-based listener-provider combination to use with the history server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17551: [SPARK-20242][Web UI] Add spark.ui.stopDelay
Github user barnardb commented on the issue: https://github.com/apache/spark/pull/17551 Our use case involves jobs running in a remote cluster without a Spark master. I agree that the history server is the better to solve this, but we'd like to get a solution that doesn't depend on a distributed filesystem. Would you be willing to consider a pull request for https://issues.apache.org/jira/browse/SPARK-19802 (sending events to a remote history server) instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17551: [SPARK-20242][Web UI] Add spark.ui.stopDelay
GitHub user barnardb opened a pull request: https://github.com/apache/spark/pull/17551 [SPARK-20242][Web UI] Add spark.ui.stopDelay ## What changes were proposed in this pull request? Adds a spark.ui.stopDelay configuration property that can be used to keep the UI running when an application has finished. This is very useful for debugging, especially when the driver application is running remotely. ## How was this patch tested? This patch was tested manually. E.g. here's a screenshot from `bin/spark-submit run-example --conf spark.ui.stopDelay=30s SparkPi 100`: ![image](https://cloud.githubusercontent.com/assets/151714/24754984/c5b1657e-1ad8-11e7-99c8-982919afc94e.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/barnardb/spark ui-defer-stop-SPARK-20242 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17551.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17551 commit aa59599053b8d786bf0d63c10bd0bc4bdf2bcfa4 Author: Ben Barnard <barna...@gmail.com> Date: 2017-04-06T09:41:20Z [SPARK-20242][Web UI] Add spark.ui.stopDelay This property can be used to keep the UI running when an application has finished. This can be very useful for debugging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org