Re: Running Spark on user-provided Hadoop installation

2015-08-18 Thread gauravsehgal
Refer: http://spark.apache.org/docs/latest/hadoop-provided.html

Specifically if you want to refer s3a paths. Please edit spark-env.sh and
add following lines at end:
SPARK_DIST_CLASSPATH=$(/path/to/hadoop/hadoop-2.7.1/bin/hadoop classpath)
export
SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:/path/to/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/*



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076p24310.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on user-provided Hadoop installation

2015-07-30 Thread Ted Yu
Herman:
For Pre-built with user-provided Hadoop, spark-1.4.1-bin-hadoop2.6.tgz,
e.g., uses hadoop-2.6 profile which defines versions of projects Spark
depends on.

Hadoop cluster is used to provide storage (hdfs) and resource management
(YARN).
For the latter, please see:
https://spark.apache.org/docs/latest/running-on-yarn.html

Cheers

On Thu, Jul 30, 2015 at 1:48 AM, hermansc herman.schis...@gmail.com wrote:

 Hi.

 I want to run Spark, and more specifically the Pre-build with
 user-provided
 Hadoop version from the downloads page, but I can't find any documentation
 on how to connect the two components together (namely Spark and Hadoop).

 I've had some success in settting SPARK_CLASSPATH to my hadoop distribution
 lib/ directory, containing jar files such as hadoop-core, hadoop-common
 etc.

 However, there seems to be many native libraries included in the assembly
 jar for Spark versions pre-built for Hadoop distributions (I'm specifically
 missing the libsnappy.so files) that are not by default included in
 distributions such as Cloudera Hadoop.

 Have anyone here actually tried to run Spark without Hadoop included in the
 assembly jar and/or have any more resources where I can read about the
 proper way of connecting them?

 As an aside, the spark-assembly jar in the Spark version pre-built for
 user-provided Hadoop distributions is named
 spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should
 be called spark-assembly-1.4.0-without-hadoop.jar :)

 --
 Herman



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org