[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21095 Thanks @liyinan926 for the review and @andrusha for addressing the comments and moving the work along. I can close this PR once #21260 merges. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21095 Hi @foxish, I don't see why the 2 testing concerns should block reviewing this PR. 1. This PR does not attempt to address non-hostpath volumes (both implementation and unit tests are hence not included). 2. I agree e2e tests do make sense. However, e2e tests should not block this PR in any way. e2e tests can also be added once the PR is reviewed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21095 Hey @liyinan926, do you think you'll have time to take a look this week? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21095 @liyinan926 Thanks for the comment. I refactored it to address your concern. The code is simple enough to be easily refactored if the need be by whoever implements the next volume type. Note that the goal of this PR is limited to hostPath volumes. Even though the implementation here is general enough to be extended, I will not attempt to add other volume types in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21095 Thanks @foxish for your feedback. As a first time contributor to Spark, I would like to limit the scope of the changes in this PR. Let me know when you're ready to review again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21095 @liyinan926 @foxish Please take a look --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...
GitHub user madanadit opened a pull request: https://github.com/apache/spark/pull/21095 [SPARK-23529][K8s] Support mounting hostPath volumes ## What changes were proposed in this pull request? This PR introduces a new config `spark.kubernetes.driver/executor.volumes` taking a values of the format documented [here](https://docs.google.com/document/d/15-mk7UnOYNTXoF6EKaVlelWYc9DTrTXrYoodwDuAwY4/edit?usp=sharing) The use case is to enable short-circuit writes to distributed storage on k8s. The Alluxio File System uses domain sockets to enable short-circuit writes from the client to worker memory when co-located on the same host machine. A directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker container as well as the Alluxio client ( = Spark executor) container. The worker creates a domain socket /tmp/domain/d and if the client container mounts the same directory, it can write directly to the Alluxio worker w/o passing through network stack. The end result is faster data access when data is local. ## How was this patch tested? Manual testing on a k8s v1.8 cluster. Unit tests added to `Driver/ExecutorPodFactorySuite`. This PR replaces https://github.com/apache/spark/pull/21032. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madanadit/spark k8s-vol Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21095 commit 51a877a7d86b6c0f18d381ba9300de18ccef9189 Author: madanadit <adit@...> Date: 2018-04-05T01:02:25Z Support mounting hostPath volumes for executors commit 4ada724ab2f6e79553fa05475b216884f9ba127a Author: madanadit <adit@...> Date: 2018-04-05T20:57:45Z Read mode for mounted volumes commit 17258a3967b6d8b2170c3c7d7186449c3155fef1 Author: madanadit <adit@...> Date: 2018-04-05T22:05:07Z Refactor commit 6fff716cbe42e184059660cf2009f594048a6420 Author: madanadit <adit@...> Date: 2018-04-06T18:18:09Z Fix style commit f961b33aa34ad7908ac6d77ded251e18bf6e835a Author: madanadit <adit@...> Date: 2018-04-06T19:57:45Z Add unit tests commit af4d9baae8d637215d2be645b377fa5dbd714e94 Author: madanadit <adit@...> Date: 2018-04-06T20:00:18Z Update comment commit 13e10493fa6f3b40c3eb406d0dc2f88d09ce20b8 Author: madanadit <adit@...> Date: 2018-04-06T23:56:31Z Fix unit tests commit 7a25d76e386c0d8157eba0ccc0517bea3b7f7f0e Author: madanadit <adit@...> Date: 2018-04-16T22:41:26Z Fix typo commit a5a277a065dec156e976478de7767062bcf1da13 Author: madanadit <adit@...> Date: 2018-04-17T21:51:58Z Change configuration format commit 772128e13895d6f313efa3cbc38947db51c3e493 Author: madanadit <adit@...> Date: 2018-04-17T23:34:37Z Fix build commit a393b92c93d60bdcf96025ae24d6c1fbecf17f9b Author: madanadit <adit@...> Date: 2018-04-17T23:44:08Z Fix test commit b9b3dcb22ded628f0abe1caa0beb2b15da6ccc49 Author: madanadit <adit@...> Date: 2018-04-18T00:39:14Z Fetch properties correctly commit 81a7811b06a0195e84ffeddc135875da6c500a7e Author: madanadit <adit@...> Date: 2018-04-18T05:21:11Z Fix test cases commit 95d1b0d8c681cd46f47c8ab1692172d0b3b0aba8 Author: madanadit <adit@...> Date: 2018-04-18T06:42:56Z Abstract tests commit facde97b365a7acb02c41e5ef076a9ea0f1edff9 Author: madanadit <adit@...> Date: 2018-04-18T06:59:43Z Add readOnly option commit ccdc7990ca8995ff86f46647f8a2949848f06380 Author: madanadit <adit@...> Date: 2018-04-18T07:06:17Z Fix test commit 7c1be8aff51a462bf96012fafbbfec765424de53 Author: madanadit <adit@...> Date: 2018-04-18T07:24:55Z Driver hostPath volumes with tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user madanadit closed the pull request at: https://github.com/apache/spark/pull/21032 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21032 @liyinan926 @foxish I've started a doc [here](https://docs.google.com/document/d/15-mk7UnOYNTXoF6EKaVlelWYc9DTrTXrYoodwDuAwY4/edit?usp=sharing). Feel free to edit and circulate to interested members of the community. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21032 @liyinan926 Sounds good --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21032 @foxish @liyinan926 Thanks for your comments. To make the configuration more general, here is an initial proposal. Context: - volumeMount requires `[name, mountPath, readWriteMode]` - volume requires `[name, type, [type_specific_optionsâ¦]]` for `type=hostPath`, `type_specific_options=[path, type=[Directory,File,â¦]]` Proposed format : ```properties spark.kubernetes.executor.volumes.[type].[name].mount=mountPath:readWriteMode spark.kubernetes.executor.volumes.[type].[name].options.[type_specific_option1]=[value] ``` where, the format for value can be chosen based on the volume type/option. Nested options can be easily supported using this format. For instance, for hostPath volumes the config would look like: ```properties spark.kubernetes.executor.volumes.hostPath.hostPath-1.mount=/opt/domain:ro spark.kubernetes.executor.volumes.hostPath.hostPath-1.options.path=/tmp/domain spark.kubernetes.executor.volumes.hostPath.hostPath-1.options.type=Directory ``` Alternatively, the volume type can be a top level property like `spark.kubernetes.executor.volumes.[name]=[type]`. Thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user madanadit commented on a diff in the pull request: https://github.com/apache/spark/pull/21032#discussion_r181197979 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -117,6 +117,13 @@ private[spark] object Config extends Logging { .stringConf .createWithDefault("spark") + val KUBERNETES_EXECUTOR_VOLUMES = --- End diff -- Yes. I'll add the driver option after I rebase on [#20910](https://github.com/apache/spark/pull/20910) then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20989: [SPARK-23529][K8s] Support mounting hostPath volu...
Github user madanadit closed the pull request at: https://github.com/apache/spark/pull/20989 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/21032 @felixcheung, @foxish can you take a look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/20989 Thanks @felixcheung!. I've re targeted to master in this [PR](https://github.com/apache/spark/pull/21032). Closing this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...
GitHub user madanadit opened a pull request: https://github.com/apache/spark/pull/21032 [SPARK-23529][K8s] Support mounting hostPath volumes for executors ## What changes were proposed in this pull request? This PR introduces a new config `spark.kubernetes.executor.volumes` taking a values of the form `hostPath:containerPath[:ro|rw][,...]`; where `hostPath` is the path for the executor pod volume, `containerPath` is the mount path and `ro` is read-only mode. The use case is to enable short-circuit writes to distributed storage on k8s. The Alluxio File System uses domain sockets to enable short-circuit writes from the client to worker memory when co-located on the same host machine. A directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker container as well as the Alluxio client ( = Spark executor) container. The worker creates a domain socket /tmp/domain/d and if the client container mounts the same directory, it can write directly to the Alluxio worker w/o passing through network stack. The end result is faster data access when data is local. ## How was this patch tested? Manual testing on a k8s v1.8 cluster. Unit tests added to `ExecutorPodFactorySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madanadit/spark k8s-vols Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21032 commit a7edf6cf71e66caaec117335bb569f9ab93967f7 Author: madanadit <adit@...> Date: 2018-04-05T01:02:25Z Support mounting hostPath volumes for executors commit 52f0d1cdcf125b3ba327a40360481eb4568991e2 Author: madanadit <adit@...> Date: 2018-04-05T20:57:45Z Read mode for mounted volumes commit 70d050ea922d51f40384b7e473fe15dcfb5f4447 Author: madanadit <adit@...> Date: 2018-04-05T22:05:07Z Refactor commit 463e9b5c227a42c4ae02f6e25475606ed66b50df Author: madanadit <adit@...> Date: 2018-04-06T18:18:09Z Fix style commit 0a21a19aca92a9409df0c03e24a8aa99d4dcd5b6 Author: madanadit <adit@...> Date: 2018-04-06T19:57:45Z Add unit tests commit 87bcc9f638cc0720e00de86f46a3e677a8fc54c8 Author: madanadit <adit@...> Date: 2018-04-06T20:00:18Z Update comment commit c72cbc1bd8a5043f5dbc55fa7ba0631c694a9317 Author: madanadit <adit@...> Date: 2018-04-06T23:56:31Z Fix unit tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/20989 @foxish The failure seems to be because of `SparkRemoteFileTest` missing from branch-2.3 (only present in master branch). Which branch do you recommend targeting this PR towards? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...
Github user madanadit commented on the issue: https://github.com/apache/spark/pull/20989 @foxish Can you take a look?. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20989: [SPARK-23529][K8s] Support mounting hostPath volu...
GitHub user madanadit opened a pull request: https://github.com/apache/spark/pull/20989 [SPARK-23529][K8s] Support mounting hostPath volumes for executors ## What changes were proposed in this pull request? This PR introduces a new config `spark.kubernetes.executor.volumes` taking a values of the form `hostPath:containerPath[:ro|rw][,...]`; where `hostPath` is the path for the executor pod volume, `containerPath` is the mount path and `ro` is read-only mode. The use case is to enable short-circuit writes to distributed storage on k8s. The Alluxio File System uses domain sockets to enable short-circuit writes from the client to worker memory when co-located on the same host machine. A directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker container as well as the Alluxio client ( = Spark executor) container. The worker creates a domain socket /tmp/domain/d and if the client container mounts the same directory, it can write directly to the Alluxio worker w/o passing through network stack. The end result is faster data access when data is local. ## How was this patch tested? Manual testing on a k8s v1.8 cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madanadit/spark k8s-volumes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20989.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20989 commit addb8fec7c7fcd1535cfdc29594d74b04981ae19 Author: madanadit <adit@...> Date: 2018-04-05T01:02:25Z Spark executor volumes configuration commit f33eaf53b914c3414a4d4a2bca9c5bc3d84aa268 Author: madanadit <adit@...> Date: 2018-04-05T20:05:48Z Fix style commit 4016d6d33c296823e338f28deb7815c4be6607a1 Author: madanadit <adit@...> Date: 2018-04-05T20:12:28Z Undo unintended style change commit c5b6c0bdc0ac46e87d4ca41ab27b81be62beee6f Author: madanadit <adit@...> Date: 2018-04-05T20:57:45Z Read mode for mounted volumes commit 17cfdf3afede72efd970c1d3ac92105257c1f63c Author: madanadit <adit@...> Date: 2018-04-05T22:05:07Z Refactor --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org