[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

lianhuiwang Wed, 29 Apr 2015 12:33:09 -0700

Github user lianhuiwang commented on the pull request:

    https://github.com/apache/spark/pull/5580#issuecomment-97553254
  
    @vanzin below code is very important.
    pyArchives = pyArchives.split(",").map { localPath=>
            val localURI = Utils.resolveURI(localPath)
           if (localURI.getScheme != "local") {
             args.files = mergeFileLists(args.files, localURI.toString)
              (new Path(localPath)).getName
            } else {
             localURI.getPath.toString
            }
          }.mkString(File.pathSeparator)
    if archives is not local, it will put archives to dist files that can be 
distributed to nodes by yarn's Client. and  it just use fileName.example: 
hdfs://xx:1234/user/pyspark.zip or file:/user/pyspark.zip. on Yarn's 
nodeManager we just use linkname.in 
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L315
 Client set fileName to dist file's linkname.
    other case is archives is local on nodes. so we just set localPath to 
PYTHONPATH.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

Reply via email to