dbtsai commented on a change in pull request #28788:
URL: https://github.com/apache/spark/pull/28788#discussion_r441985622



##########
File path: docs/running-on-yarn.md
##########
@@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine 
than the client, so `S
 
 Running Spark on YARN requires a binary distribution of Spark which is built 
with YARN support.
 Binary distributions can be downloaded from the [downloads 
page](https://spark.apache.org/downloads.html) of the project website.
+There are two variants of Spark binary distributions you can download. One is 
pre-built with a certain
+version of Apache Hadoop; this Spark distribution contains built-in Hadoop 
runtime, so we call it `with-hadoop` Spark
+distribution. The other one is pre-built with user-provided Hadoop; since this 
Spark distribution
+doesn't contain a built-in Hadoop runtime, it's smaller, but users have to 
provide a Hadoop installation separately.
+We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark 
distribution, since
+it contains a built-in Hadoop runtime already, by default, when a job is 
submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not
+populate Yarn's classpath into Spark. To override this behavior, you can set 
<code>spark.yarn.populateHadoopClasspath=true</code>.
+For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by 
default in order to get Hadoop runtime. Note that some features such
+as Hive support are not available in `no-hadoop` Spark distribution. For 
`with-hadoop` Spark distribution,

Review comment:
       Okay, I'm getting old now, bad memory :) Just somehow remember you told 
me this, and that was why our internal `no-hadoop` build can not support hive. 
I removed those lines. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to