Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/601#discussion_r12173906
--- Diff: docs/running-on-yarn.md ---
@@ -12,12 +12,14 @@ was added to Spark in version 0.6.0, and improved in
0.7.0 and 0.8.0.
We need a consolidated Spark JAR (which bundles all the required
dependencies) to run Spark jobs on a YARN cluster.
--- End diff --
This section is actually redundant with the "building with maven" section
about Hadoop versions. Maybe it would make sense to just have a short
introduction here that explains that (i) you need to have a version of spark
that is specially compiled with YARN support if you don't already and (ii) if
you don't have one go to the maven build to learn how to make one.
I think right now if users go to this, the first thing they'll think is
they have to go build Spark. But actually in almost all cases they can just
download the pre-build yarn jar and be done with it. I think the first draft of
this document was when we didn't package a binary with YARN.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---