[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21420


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21420#discussion_r191011438
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
 // Usage: PythonAppRunner   
[app arguments]
 args.mainClass = "org.apache.spark.deploy.PythonRunner"
 args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) 
++ args.childArgs
-if (clusterManager != YARN) {
-  // The YARN backend distributes the primary file differently, so 
don't merge it.
-  args.files = mergeFileLists(args.files, args.primaryResource)
-}
   }
   if (clusterManager != YARN) {
 // The YARN backend handles python files differently, so don't 
merge the lists.
 args.files = mergeFileLists(args.files, args.pyFiles)
   }
-  if (localPyFiles != null) {
+}
+
+if (localPyFiles != null) {
 sparkConf.set("spark.submit.pyFiles", localPyFiles)
--- End diff --

Looks indented too far now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21420#discussion_r191011981
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -1093,6 +1097,44 @@ class SparkSubmitSuite
 assert(exception.getMessage() === "hello")
   }
 
+  test("support --py-files/spark.submit.pyFiles in non pyspark 
application") {
+val hadoopConf = new Configuration()
+updateConfWithFakeS3Fs(hadoopConf)
+
+val tmpDir = Utils.createTempDir()
+val pyFile = File.createTempFile("tmpPy", ".egg", tmpDir)
+
+val args = Seq(
+  "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
+  "--name", "testApp",
+  "--master", "yarn",
+  "--deploy-mode", "client",
+  "--py-files", s"s3a://${pyFile.getAbsolutePath}",
+  "spark-internal"
+)
+
+val appArgs = new SparkSubmitArguments(args)
+val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs, conf = 
Some(hadoopConf))
+
+conf.get("spark.yarn.dist.pyFiles") should be 
(s"s3a://${pyFile.getAbsolutePath}")
+conf.get("spark.submit.pyFiles") should (startWith("/"))
+
+// Verify "spark.submit.pyFiles"
+val args1 = Seq(
+  "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
+  "--name", "testApp",
+  "--master", "yarn",
+  "--deploy-mode", "client",
+  "--conf", s"spark.submit.pyFiles=s3a://${pyFile.getAbsolutePath}",
+  "spark-internal"
+)
+
+val appArgs1 = new SparkSubmitArguments(args1)
+val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1, conf 
= Some(hadoopConf))
+
+conf1.get("spark.yarn.dist.pyFiles") should be 
(s"s3a://${pyFile.getAbsolutePath}")
--- End diff --

use `PY_FILES.key`, also in other places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21420#discussion_r190783462
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
 // Usage: PythonAppRunner   
[app arguments]
 args.mainClass = "org.apache.spark.deploy.PythonRunner"
 args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) 
++ args.childArgs
-if (clusterManager != YARN) {
-  // The YARN backend distributes the primary file differently, so 
don't merge it.
-  args.files = mergeFileLists(args.files, args.primaryResource)
--- End diff --

it is duplicated with below code, you can check the original code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21420#discussion_r190783213
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
 // Usage: PythonAppRunner   
[app arguments]
 args.mainClass = "org.apache.spark.deploy.PythonRunner"
 args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) 
++ args.childArgs
-if (clusterManager != YARN) {
-  // The YARN backend distributes the primary file differently, so 
don't merge it.
-  args.files = mergeFileLists(args.files, args.primaryResource)
--- End diff --

Eh @jerryshao why did we remove this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-24 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/21420

[SPARK-24377][Spark Submit] make --py-files work in non pyspark application

## What changes were proposed in this pull request?

For some Spark applications, though they're a java program, they require 
not only jar dependencies, but also python dependencies. One example is Livy 
remote SparkContext application, this application is actually an embedded REPL 
for Scala/Python/R, it will not only load in jar dependencies, but also python 
and R deps, so we should specify not only "--jars", but also "--py-files".

Currently for a Spark application, --py-files can only be worked for a 
pyspark application, so it will not be worked in the above case. So here 
propose to remove such restriction.

Also we tested that "spark.submit.pyFiles" only supports quite limited 
scenario (client mode with local deps), so here also expand the usage of 
"spark.submit.pyFiles" to be alternative of --py-files.

## How was this patch tested?

UT added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-24377

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21420


commit a41c99bf311aa8f4e0c2e07c1288f5a11e057ea4
Author: jerryshao 
Date:   2018-05-24T06:53:23Z

make --py-files work in non pyspark application




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org