[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-05-08 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21095
  
Thanks @liyinan926 for the review and @andrusha for addressing the comments 
and moving the work along. I can close this PR once #21260 merges.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-30 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21095
  
Hi @foxish, I don't see why the 2 testing concerns should block reviewing 
this PR.
1. This PR does not attempt to address non-hostpath volumes (both 
implementation and unit tests are hence not included).
2. I agree e2e tests do make sense. However, e2e tests should not block 
this PR in any way. e2e tests can also be added once the PR is reviewed?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-26 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21095
  
Hey @liyinan926, do you think you'll have time to take a look this week? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-19 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21095
  
@liyinan926 Thanks for the comment. I refactored it to address your 
concern. The code is simple enough to be easily refactored if the need be by 
whoever implements the next volume type.

Note that the goal of this PR is limited to hostPath volumes. Even though 
the implementation here is general enough to be extended, I will not attempt to 
add other volume types in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-18 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21095
  
Thanks @foxish for your feedback. As a first time contributor to Spark, I 
would like to limit the scope of the changes in this PR. Let me know when 
you're ready to review again. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-18 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21095
  
@liyinan926 @foxish Please take a look


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-18 Thread madanadit
GitHub user madanadit opened a pull request:

https://github.com/apache/spark/pull/21095

[SPARK-23529][K8s] Support mounting hostPath volumes

## What changes were proposed in this pull request?

This PR introduces a new config `spark.kubernetes.driver/executor.volumes` 
taking a values of the format documented 
[here](https://docs.google.com/document/d/15-mk7UnOYNTXoF6EKaVlelWYc9DTrTXrYoodwDuAwY4/edit?usp=sharing)

The use case is to enable short-circuit writes to distributed storage on 
k8s. The Alluxio File System uses domain sockets to enable short-circuit writes 
from the client to worker memory when co-located on the same host machine. A 
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker 
container as well as the Alluxio client ( = Spark executor) container. The 
worker creates a domain socket /tmp/domain/d and if the client container mounts 
the same directory, it can write directly to the Alluxio worker w/o passing 
through network stack. The end result is faster data access when data is local.

## How was this patch tested?

Manual testing on a k8s v1.8 cluster. Unit tests added to 
`Driver/ExecutorPodFactorySuite`.

This PR replaces https://github.com/apache/spark/pull/21032.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madanadit/spark k8s-vol

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21095


commit 51a877a7d86b6c0f18d381ba9300de18ccef9189
Author: madanadit <adit@...>
Date:   2018-04-05T01:02:25Z

Support mounting hostPath volumes for executors

commit 4ada724ab2f6e79553fa05475b216884f9ba127a
Author: madanadit <adit@...>
Date:   2018-04-05T20:57:45Z

Read mode for mounted volumes

commit 17258a3967b6d8b2170c3c7d7186449c3155fef1
Author: madanadit <adit@...>
Date:   2018-04-05T22:05:07Z

Refactor

commit 6fff716cbe42e184059660cf2009f594048a6420
Author: madanadit <adit@...>
Date:   2018-04-06T18:18:09Z

Fix style

commit f961b33aa34ad7908ac6d77ded251e18bf6e835a
Author: madanadit <adit@...>
Date:   2018-04-06T19:57:45Z

Add unit tests

commit af4d9baae8d637215d2be645b377fa5dbd714e94
Author: madanadit <adit@...>
Date:   2018-04-06T20:00:18Z

Update comment

commit 13e10493fa6f3b40c3eb406d0dc2f88d09ce20b8
Author: madanadit <adit@...>
Date:   2018-04-06T23:56:31Z

Fix unit tests

commit 7a25d76e386c0d8157eba0ccc0517bea3b7f7f0e
Author: madanadit <adit@...>
Date:   2018-04-16T22:41:26Z

Fix typo

commit a5a277a065dec156e976478de7767062bcf1da13
Author: madanadit <adit@...>
Date:   2018-04-17T21:51:58Z

Change configuration format

commit 772128e13895d6f313efa3cbc38947db51c3e493
Author: madanadit <adit@...>
Date:   2018-04-17T23:34:37Z

Fix build

commit a393b92c93d60bdcf96025ae24d6c1fbecf17f9b
Author: madanadit <adit@...>
Date:   2018-04-17T23:44:08Z

Fix test

commit b9b3dcb22ded628f0abe1caa0beb2b15da6ccc49
Author: madanadit <adit@...>
Date:   2018-04-18T00:39:14Z

Fetch properties correctly

commit 81a7811b06a0195e84ffeddc135875da6c500a7e
Author: madanadit <adit@...>
Date:   2018-04-18T05:21:11Z

Fix test cases

commit 95d1b0d8c681cd46f47c8ab1692172d0b3b0aba8
Author: madanadit <adit@...>
Date:   2018-04-18T06:42:56Z

    Abstract tests

commit facde97b365a7acb02c41e5ef076a9ea0f1edff9
Author: madanadit <adit@...>
Date:   2018-04-18T06:59:43Z

Add readOnly option

commit ccdc7990ca8995ff86f46647f8a2949848f06380
Author: madanadit <adit@...>
Date:   2018-04-18T07:06:17Z

Fix test

commit 7c1be8aff51a462bf96012fafbbfec765424de53
Author: madanadit <adit@...>
Date:   2018-04-18T07:24:55Z

Driver hostPath volumes with tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-16 Thread madanadit
Github user madanadit closed the pull request at:

https://github.com/apache/spark/pull/21032


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-13 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21032
  
@liyinan926 @foxish I've started a doc 
[here](https://docs.google.com/document/d/15-mk7UnOYNTXoF6EKaVlelWYc9DTrTXrYoodwDuAwY4/edit?usp=sharing).
 Feel free to edit and circulate to interested members of the community.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-13 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21032
  
@liyinan926 Sounds good


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-12 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21032
  
@foxish @liyinan926 Thanks for your comments. To make the configuration 
more general, here is an initial proposal. 

Context:
- volumeMount requires `[name, mountPath, readWriteMode]`
- volume requires `[name, type, [type_specific_options…]]`
for `type=hostPath`, `type_specific_options=[path, 
type=[Directory,File,…]]`

Proposed format :
```properties

spark.kubernetes.executor.volumes.[type].[name].mount=mountPath:readWriteMode

spark.kubernetes.executor.volumes.[type].[name].options.[type_specific_option1]=[value]
```
where, the format for value can be chosen based on the volume type/option. 
Nested options can be easily supported using this format. For instance, for 
hostPath volumes the config would look like:

```properties
spark.kubernetes.executor.volumes.hostPath.hostPath-1.mount=/opt/domain:ro

spark.kubernetes.executor.volumes.hostPath.hostPath-1.options.path=/tmp/domain
spark.kubernetes.executor.volumes.hostPath.hostPath-1.options.type=Directory
```
Alternatively, the volume type can be a top level property like 
`spark.kubernetes.executor.volumes.[name]=[type]`. Thoughts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-12 Thread madanadit
Github user madanadit commented on a diff in the pull request:

https://github.com/apache/spark/pull/21032#discussion_r181197979
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -117,6 +117,13 @@ private[spark] object Config extends Logging {
   .stringConf
   .createWithDefault("spark")
 
+  val KUBERNETES_EXECUTOR_VOLUMES =
--- End diff --

Yes. I'll add the driver option after I rebase on 
[#20910](https://github.com/apache/spark/pull/20910) then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20989: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-10 Thread madanadit
Github user madanadit closed the pull request at:

https://github.com/apache/spark/pull/20989


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-10 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/21032
  
@felixcheung, @foxish can you take a look? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-10 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/20989
  
Thanks  @felixcheung!. I've re targeted to master in this 
[PR](https://github.com/apache/spark/pull/21032). Closing this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-10 Thread madanadit
GitHub user madanadit opened a pull request:

https://github.com/apache/spark/pull/21032

[SPARK-23529][K8s] Support mounting hostPath volumes for executors

## What changes were proposed in this pull request?

This PR introduces a new config `spark.kubernetes.executor.volumes` taking 
a values of the form  `hostPath:containerPath[:ro|rw][,...]`; where  `hostPath` 
is the path for the executor pod volume, `containerPath` is the mount path and 
`ro` is read-only mode.

The use case is to enable short-circuit writes to distributed storage on 
k8s. The Alluxio File System uses domain sockets to enable short-circuit writes 
from the client to worker memory when co-located on the same host machine. A 
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker 
container as well as the Alluxio client ( = Spark executor) container. The 
worker creates a domain socket /tmp/domain/d and if the client container mounts 
the same directory, it can write directly to the Alluxio worker w/o passing 
through network stack. The end result is faster data access when data is local.

## How was this patch tested?

Manual testing on a k8s v1.8 cluster. Unit tests added to 
`ExecutorPodFactorySuite`.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madanadit/spark k8s-vols

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21032


commit a7edf6cf71e66caaec117335bb569f9ab93967f7
Author: madanadit <adit@...>
Date:   2018-04-05T01:02:25Z

Support mounting hostPath volumes for executors

commit 52f0d1cdcf125b3ba327a40360481eb4568991e2
Author: madanadit <adit@...>
Date:   2018-04-05T20:57:45Z

Read mode for mounted volumes

commit 70d050ea922d51f40384b7e473fe15dcfb5f4447
Author: madanadit <adit@...>
Date:   2018-04-05T22:05:07Z

Refactor

commit 463e9b5c227a42c4ae02f6e25475606ed66b50df
Author: madanadit <adit@...>
Date:   2018-04-06T18:18:09Z

Fix style

commit 0a21a19aca92a9409df0c03e24a8aa99d4dcd5b6
Author: madanadit <adit@...>
Date:   2018-04-06T19:57:45Z

Add unit tests

commit 87bcc9f638cc0720e00de86f46a3e677a8fc54c8
Author: madanadit <adit@...>
Date:   2018-04-06T20:00:18Z

Update comment

commit c72cbc1bd8a5043f5dbc55fa7ba0631c694a9317
Author: madanadit <adit@...>
Date:   2018-04-06T23:56:31Z

Fix unit tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-09 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/20989
  
@foxish The failure seems to be because of `SparkRemoteFileTest` missing 
from branch-2.3 (only present in master branch). Which branch do you recommend 
targeting this PR towards?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-05 Thread madanadit
Github user madanadit commented on the issue:

https://github.com/apache/spark/pull/20989
  
@foxish Can you take a look?. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20989: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-05 Thread madanadit
GitHub user madanadit opened a pull request:

https://github.com/apache/spark/pull/20989

[SPARK-23529][K8s] Support mounting hostPath volumes for executors

## What changes were proposed in this pull request?

This PR introduces a new config `spark.kubernetes.executor.volumes` taking 
a values of the form  `hostPath:containerPath[:ro|rw][,...]`; where  `hostPath` 
is the path for the executor pod volume, `containerPath` is the mount path and 
`ro` is read-only mode.

The use case is to enable short-circuit writes to distributed storage on 
k8s. The Alluxio File System uses domain sockets to enable short-circuit writes 
from the client to worker memory when co-located on the same host machine. A 
directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker 
container as well as the Alluxio client ( = Spark executor) container. The 
worker creates a domain socket /tmp/domain/d and if the client container mounts 
the same directory, it can write directly to the Alluxio worker w/o passing 
through network stack. The end result is faster data access when data is local.

## How was this patch tested?

Manual testing on a k8s v1.8 cluster.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madanadit/spark k8s-volumes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20989


commit addb8fec7c7fcd1535cfdc29594d74b04981ae19
Author: madanadit <adit@...>
Date:   2018-04-05T01:02:25Z

Spark executor volumes configuration

commit f33eaf53b914c3414a4d4a2bca9c5bc3d84aa268
Author: madanadit <adit@...>
Date:   2018-04-05T20:05:48Z

Fix style

commit 4016d6d33c296823e338f28deb7815c4be6607a1
Author: madanadit <adit@...>
Date:   2018-04-05T20:12:28Z

Undo unintended style change

commit c5b6c0bdc0ac46e87d4ca41ab27b81be62beee6f
Author: madanadit <adit@...>
Date:   2018-04-05T20:57:45Z

Read mode for mounted volumes

commit 17cfdf3afede72efd970c1d3ac92105257c1f63c
Author: madanadit <adit@...>
Date:   2018-04-05T22:05:07Z

Refactor




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org