[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-09-07 Thread adambalogh
Github user adambalogh commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r215947975
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

@vanzin Did you have time to think about how this config should work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-31 Thread adambalogh
Github user adambalogh commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r214428426
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

Yes, it is quite tricky. My expectation is that it would behave the same 
way as if you pointed `HADOOP_CONF_DIR` and `YARN_CONF_DIR` to different 
directories that both contain `hdfs-site.xml`. Files in `HADOOP_CONF_DIR` would 
take precedence (as far as I know, nothing prevents this from happening). So 
with this new config, the order of preference would be `HADOOP_CONF_DIR`, 
`YARN_CONF_DIR`, then `spark.yarn.conf.dir`. 

Perhaps this could be clarified in the docs,  but let me know what you 
think about it, I'm happy to implement other resolutions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-31 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r214414143
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

Saisai's question about the classpath configuration is actually the most 
complicated part of this feature. I haven't fully thought about how they would 
play out, but I really don't think it's as simple as appending this new config 
to the classpath.

e.g. what is the expectation if you run "spark-shell" with this option? Do 
you end up using the config from the env variable or from the config? If you 
have both, and you reference a file in `--files` that is on an HDFS namespace 
declared in the `hdfs-site.xml` from the config, what will happen? (Answer: it 
will be ignored, since that is being masked by the `hdfs-site.xml` from the env 
variable.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-31 Thread adambalogh
Github user adambalogh commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r214401744
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

@jerryshao My understanding is that this method is not used by the 
InProcessLauncher. So instead, the caller of InProcessLauncher has to make sure 
that the conf files are available to hadoop's Configuration class in the 
YarnClusterApplication. For example, by adding the config files to the calling 
thread's context class loader


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-31 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r214381467
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

On another note, is this meant to extend to other resource-managers? As 
Kubernetes assumes only the ENV `HADOOP_CONF_DIR`, but if such a change is 
desirable this would cause a slight re-work of the current Hadoop Conf Files 
mounting logic. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-30 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22289#discussion_r214233802
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) {
 
 addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
 addToClassPath(cp, getenv("YARN_CONF_DIR"));
+addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir"));
--- End diff --

I'm wondering how do we update the classpath to change to another hadoop 
confs with InProcessLauncher? Seems the classpath here is not changeable after 
JVM is launched.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...

2018-08-30 Thread adambalogh
GitHub user adambalogh opened a pull request:

https://github.com/apache/spark/pull/22289

[SPARK-25200][YARN] Allow specifying HADOOP_CONF_DIR as spark property

## What changes were proposed in this pull request?

When submitting applications to Yarn in cluster mode, using the 
InProcessLauncher, spark finds the cluster's configuration files based on the 
HADOOP_CONF_DIR/YARN_CONF_DIR environment variables. This does not make it 
possible to submit to more than one Yarn clusters concurrently using the 
InProcessLauncher. 

This PR adds a new property `spark.yarn.conf.dir` that lets users select 
the location of the config files for each submission separately.

## How was this patch tested?

Integration test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adambalogh/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22289


commit f285081c8307e090638213b4e26cf5c35072b0e0
Author: Adam Balogh 
Date:   2018-08-30T13:05:43Z

read hadoop config dir from spark property

commit ae41a6603d22e66367d9d25890cf888737ca5db7
Author: Adam Balogh 
Date:   2018-08-30T13:08:36Z

Merge remote-tracking branch 'upstream/master'

commit 503c9859c37afbf753ee70459c122199d4942af2
Author: Adam Balogh 
Date:   2018-08-30T14:12:47Z

renames

commit 5622511769a6422d6268163fa7e77981e904b732
Author: Adam Balogh 
Date:   2018-08-30T18:31:08Z

test

commit d60736acc9b346c75669f924f34a29add855c53b
Author: Adam Balogh 
Date:   2018-08-30T18:39:26Z

remove SPARK_TEST_HADOOP_CONF_DIR

commit d690b26956323144f8eeb76e79467920ca7dde23
Author: Adam Balogh 
Date:   2018-08-30T18:40:17Z

move conf.dir to end of classpath

commit bed3f447886caa5419575d8cbc5f29e4ab8ab9cf
Author: Adam Balogh 
Date:   2018-08-30T18:45:06Z

use right prop in test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org