[GitHub] spark issue #19354: [SPARK-20992][Scheduler] Add links in documentation to N...

2017-10-16 Thread barnardb
Github user barnardb commented on the issue:

https://github.com/apache/spark/pull/19354
  
I've updated the PR to leave the Kubernetes links as they are, and just to 
add a link to the Nomad integration project with wording indicating that it's 
not supported by the Spark project.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19354: [SPARK-20992][Scheduler] Add links in documentation to N...

2017-10-13 Thread barnardb
Github user barnardb commented on the issue:

https://github.com/apache/spark/pull/19354
  
I totally understand the reluctance to have non-ASF projects in a list 
headed by "The system currently supports…". Looking at the [Powered 
By](https://spark.apache.org/powered-by.html) page, it doesn't look like the 
best way to help users find third-party cluster integrations. The [Spark 
Packages](https://spark-packages.org) site @srowen mentions seems a slightly 
better fit, but I don't think it would occur to me that I could look there for 
cluster manager integrations. Would it be appropriate to remove the Kubernetes 
and Nomad items from the list of supported cluster managers, and instead follow 
the list with a short line that links to these integrations but makes their 
unsupported status clear? Something like:

> Third-party projects (not supported by the Spark project) exist to add 
support for [Kubernetes](https://github.com/apache-spark-on-k8s/) and 
[Nomad](https://github.com/hashicorp/nomad-spark) as cluster managers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18209: [SPARK-20992][Scheduler] Add support for Nomad as...

2017-09-26 Thread barnardb
Github user barnardb closed the pull request at:

https://github.com/apache/spark/pull/18209


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19354: [SPARK-20992][Scheduler] Add links in documentati...

2017-09-26 Thread barnardb
GitHub user barnardb opened a pull request:

https://github.com/apache/spark/pull/19354

[SPARK-20992][Scheduler] Add links in documentation to Nomad integration.

## What changes were proposed in this pull request?

Adds links to the fork that provides integration with Nomad, in the same 
places the k8s integration is linked to.

## How was this patch tested?

I clicked on the links to make sure they're correct ;)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hashicorp/nomad-spark 
link-to-nomad-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19354.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19354


commit 3369b8d75486f8045ba5e409feb6f032623f37cb
Author: Ben Barnard <barna...@gmail.com>
Date:   2017-09-11T10:39:53Z

Add links in documentation to Nomad integration.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18209: [SPARK-20992][Scheduler] Add support for Nomad as...

2017-06-05 Thread barnardb
GitHub user barnardb opened a pull request:

https://github.com/apache/spark/pull/18209

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend

## What changes were proposed in this pull request?

Adds support for [Nomad](https://github.com/hashicorp/nomad) as a scheduler 
backend. Nomad is a cluster manager designed for both long lived services and 
short lived batch processing workloads.

The integration supports client and cluster mode, dynamic allocation 
(increasing only), has basic support for python and R applications, and works 
with applications packaged either as JARs or as docker images.

Documentation is in 
[docs/running-on-nomad.md](https://github.com/barnardb/spark/blob/nomad/docs/running-on-nomad.md).

This will be [presented at Spark Summit 
2017](https://spark-summit.org/2017/events/homologous-apache-spark-clusters-using-nomad/).

A build of the pull request with Nomad support is at available 
[here](https://www.dropbox.com/s/llcv388yl5hweje/spark-2.3.0-SNAPSHOT-bin-nomad.tgz?dl=0).

Feedback would be much appreciated.

## How was this patch tested?

This patch was tested with Integration and manual tests, and a load test 
was performed to ensure it doesn't have worse performance than the YARN 
integration.

The feature was developed and tested against Nomad 0.5.6 (current stable 
version)
on Spark 2.1.0, rebased to 2.1.1 and retested, and finally rebased to 
master and retested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/barnardb/spark nomad

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18209


commit c762194188e64cccff8a9758885b45f9d395cced
Author: Ben Barnard <barna...@gmail.com>
Date:   2017-06-06T01:19:35Z

Add support for Nomad as a scheduler backend




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17551: [SPARK-20242][Web UI] Add spark.ui.stopDelay

2017-04-06 Thread barnardb
Github user barnardb commented on the issue:

https://github.com/apache/spark/pull/17551
  
> It's still running your code, right? Why can't you add a configuration to 
your own code that tells it to wait some time before shutting down the 
SparkContext?

We're trying to support arbitrary jobs running on the cluster, to make it 
easy for users to inspect the jobs that they run there. This was a quick way to 
achieve that, but I agree with the other commenters that this quite hacky, and 
that the history server would be a nicer solution. Our problem with the history 
server right now is that while the current driver-side `EventLoggingListener` + 
history-server-side `FsHistoryProvider` implementations are great for 
environments with HDFS, they're much less convenient in a cluster without a 
distributed filesystem. I'd propose that I close this PR, and work on an 
RPC-based listener-provider combination to use with the history server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17551: [SPARK-20242][Web UI] Add spark.ui.stopDelay

2017-04-06 Thread barnardb
Github user barnardb commented on the issue:

https://github.com/apache/spark/pull/17551
  
Our use case involves jobs running in a remote cluster without a Spark 
master. I agree that the history server is the better to solve this, but we'd 
like to get a solution that doesn't depend on a distributed filesystem. Would 
you be willing to consider a pull request for 
https://issues.apache.org/jira/browse/SPARK-19802 (sending events to a remote 
history server) instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17551: [SPARK-20242][Web UI] Add spark.ui.stopDelay

2017-04-06 Thread barnardb
GitHub user barnardb opened a pull request:

https://github.com/apache/spark/pull/17551

[SPARK-20242][Web UI] Add spark.ui.stopDelay

## What changes were proposed in this pull request?

Adds a spark.ui.stopDelay configuration property that can be used to keep 
the UI running when an application has finished. This is very useful for 
debugging, especially when the driver application is running remotely.

## How was this patch tested?

This patch was tested manually. E.g. here's a screenshot from 
`bin/spark-submit run-example --conf spark.ui.stopDelay=30s SparkPi 100`:


![image](https://cloud.githubusercontent.com/assets/151714/24754984/c5b1657e-1ad8-11e7-99c8-982919afc94e.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/barnardb/spark ui-defer-stop-SPARK-20242

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17551.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17551


commit aa59599053b8d786bf0d63c10bd0bc4bdf2bcfa4
Author: Ben Barnard <barna...@gmail.com>
Date:   2017-04-06T09:41:20Z

[SPARK-20242][Web UI] Add spark.ui.stopDelay

This property can be used to keep the UI running when an application
has finished. This can be very useful for debugging.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org