dbtsai commented on a change in pull request #28788: URL: https://github.com/apache/spark/pull/28788#discussion_r440618141
########## File path: docs/running-on-yarn.md ########## @@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S Running Spark on YARN requires a binary distribution of Spark which is built with YARN support. Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website. +There are two variants of Spark binary distributions you can download. One is pre-built with a certain +version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark +distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution +doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately. +We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since +it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not +populate Yarn's classpath into Spark. To override this behavior, you can set <code>spark.yarn.populateHadoopClasspath=true</code>. +For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such +as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, Review comment: Maybe I'm wrong, but I got the impression from @dongjoon-hyun that no-hadoop Spark distribution doesn't support Hive. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org