Herman:
For Pre-built with user-provided Hadoop, spark-1.4.1-bin-hadoop2.6.tgz,
e.g., uses hadoop-2.6 profile which defines versions of projects Spark
depends on.
Hadoop cluster is used to provide storage (hdfs) and resource management
(YARN).
For the latter, please see:
https://spark.apache.org/docs/latest/running-on-yarn.html
Cheers
On Thu, Jul 30, 2015 at 1:48 AM, hermansc herman.schis...@gmail.com wrote:
Hi.
I want to run Spark, and more specifically the Pre-build with
user-provided
Hadoop version from the downloads page, but I can't find any documentation
on how to connect the two components together (namely Spark and Hadoop).
I've had some success in settting SPARK_CLASSPATH to my hadoop distribution
lib/ directory, containing jar files such as hadoop-core, hadoop-common
etc.
However, there seems to be many native libraries included in the assembly
jar for Spark versions pre-built for Hadoop distributions (I'm specifically
missing the libsnappy.so files) that are not by default included in
distributions such as Cloudera Hadoop.
Have anyone here actually tried to run Spark without Hadoop included in the
assembly jar and/or have any more resources where I can read about the
proper way of connecting them?
As an aside, the spark-assembly jar in the Spark version pre-built for
user-provided Hadoop distributions is named
spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should
be called spark-assembly-1.4.0-without-hadoop.jar :)
--
Herman
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org