What is yarn cluster? And, does spark necessarily need Hadoop already installed in the cluster? For example, can one download spark and run it on a bunch of nodes, with no prior installation of Hadoop?
Thanks, From: Yi Tian [mailto:[email protected]] Sent: Sunday, October 26, 2014 9:08 PM To: Pagliari, Roberto Cc: [email protected] Subject: Re: Spark SQL configuration You can write `HADOOP_CONF_DIR=your_hadoop_conf_path` to `conf/spark-env.sh` to enable: 1 connect to your yarn cluster 2 set `hdfs` as default FileSystem, otherwise you have to write “hdfs://“ before every paths you defined, like: `val input = sc.textFile(“hdfs://user/spark/test.dat”)` Best Regards, Yi Tian [email protected]<mailto:[email protected]> On Oct 27, 2014, at 07:59, Pagliari, Roberto <[email protected]<mailto:[email protected]>> wrote: I’m a newbie with Spark. After installing it on all the machines I want to use, do I need to tell it about Hadoop configuration, or will it be able to find it himself? Thank you,
