You will face problems if the spark version isn't compatible with your hadoop version. (Lets say you have hadoop 2.x and you downloaded spark pre-compiled with hadoop 1.x then it would be a problem.) Of course you can use spark without telling about any hadoop configurations unless you are trying to access the HDFS.
If you are having the spark hadoop version same as your hadoop version then you can set the HADOOP_CONF_DIR=/path/to/your/hadoop/conf/ inside the spark-env.sh file inside $SPARK_HOME/conf/ Thanks Best Regards On Mon, Oct 27, 2014 at 9:46 AM, Pagliari, Roberto <[email protected]> wrote: > What is yarn cluster? > > > > And, does spark necessarily need Hadoop already installed in the cluster? > For example, can one download spark and run it on a bunch of nodes, with no > prior installation of Hadoop? > > > > Thanks, > > > > > > *From:* Yi Tian [mailto:[email protected]] > *Sent:* Sunday, October 26, 2014 9:08 PM > *To:* Pagliari, Roberto > *Cc:* [email protected] > *Subject:* Re: Spark SQL configuration > > > > You can write `HADOOP_CONF_DIR=your_hadoop_conf_path` to > `conf/spark-env.sh` to enable: > > > > 1 connect to your yarn cluster > > 2 set `hdfs` as default FileSystem, otherwise you have to write “hdfs://“ > before every paths you defined, like: `val input = sc.textFile(“ > hdfs://user/spark/test.dat”)` > > > > > Best Regards, > > Yi Tian > [email protected] > > > > > On Oct 27, 2014, at 07:59, Pagliari, Roberto <[email protected]> > wrote: > > > > I’m a newbie with Spark. After installing it on all the machines I want to > use, do I need to tell it about Hadoop configuration, or will it be able to > find it himself? > > > > Thank you, > > >
