Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/5933#discussion_r29734022
--- Diff: docs/running-on-yarn.md ---
@@ -305,3 +305,4 @@ If you need a reference to the proper location to put
log files in the YARN so t
- In `yarn-cluster` mode, the local directories used by the Spark
executors and the Spark driver will be the local directories configured for
YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies
`spark.local.dir`, it will be ignored. In `yarn-client` mode, the Spark
executors will use the local directories configured for YARN while the Spark
driver will use those defined in `spark.local.dir`. This is because the Spark
driver does not run on the YARN cluster in `yarn-client` mode, only the Spark
executors do.
- The `--files` and `--archives` options support specifying file names
with the # similar to Hadoop. For example you can specify: `--files
localtest.txt#appSees.txt` and this will upload the file you have locally named
localtest.txt into HDFS but this will be linked to by the name `appSees.txt`,
and your application should use the name as `appSees.txt` to reference it when
running on YARN.
- The `--jars` option allows the `SparkContext.addJar` function to work if
you are using it with local files and running in `yarn-cluster` mode. It does
not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.
+- In order to make PySpark work on Yarn, please build spark with Jave 6.
--- End diff --
actually we should really fix this ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]