>From what i understand getting Spark to run alongside a hadoop cluster
requires the following

a) a working hadoop
b) a compiled Spark
c) configuration parameters that point spark to the right hadoop conf files

i ) Can you let me know the specific steps to take after spark was compiled
(via sbt assembly in the spark-incubation folder) to ensure that the
benchmarks are running and writing to the hdfs?
ii) Stand-alone mode means not running without a hadoop correct?
iii) The launching spark on yarn (
https://spark.incubator.apache.org/docs/latest/running-on-yarn.html) is
specific to hadoop 2.0.0 version right? There is hardly any documentation
on spark docs to get this running on hadoop version 1. What are the steps?
iv) The cluster launch scripts specified in (
https://spark.apache.org/docs/0.9.0/spark-standalone.html) need to be used
when spark is running alongside hadoop. Yes or No?
v) In order to run Spark on hadoop version 1, there is something called as
SIMR (Spark in MapReduce). Did anyone try this? is it worth it?

Thanks for your patience in answering my questions. For each or all of the
questions if you find a useful link please share.

Thanks,
Shivani
-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Reply via email to