Hello Daniel, If you’re not using Hadoop then why you want to grab the Hadoop package? CDH5 will download all the Hadoop packages and cloudera manager too.
Just curious what happen if you start spark on EC2 cluster, what it choose for the data store as default? -Sanjeev From: Daniel Siegmann [mailto:[email protected]] Sent: Thursday, August 28, 2014 2:04 PM To: Sagar, Sanjeev Cc: [email protected] Subject: Re: Q on downloading spark for standalone cluster If you aren't using Hadoop, I don't think it matters which you download. I'd probably just grab the Hadoop 2 package. Out of curiosity, what are you using as your data store? I get the impression most Spark users are using HDFS or something built on top. On Thu, Aug 28, 2014 at 4:07 PM, Sanjeev Sagar <[email protected]<mailto:[email protected]>> wrote: Hello there, I've a basic question on the downloadthat which option I need to downloadfor standalone cluster. I've a private cluster of three machineson Centos. When I click on download it shows me following: Download Spark The latest release is Spark 1.0.2, released August 5, 2014 (release notes) <http://spark.apache.org/releases/spark-release-1-0-2.html> (git tag) <https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f> Pre-built packages: * For Hadoop 1 (HDP1, CDH3): find an Apache mirror <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz> or direct file download <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz> * For CDH4: find an Apache mirror <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz> or direct file download <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz> * For Hadoop 2 (HDP2, CDH5): find an Apache mirror <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz> or direct file download <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz> Pre-built packages, third-party (NOTE: may include non ASF-compatible licenses): * For MapRv3: direct file download (external) <http://package.mapr.com/tools/apache-spark/1.0.2/spark-1.0.2-bin-mapr3.tgz> * For MapRv4: direct file download (external) <http://package.mapr.com/tools/apache-spark/1.0.2/spark-1.0.2-bin-mapr4.tgz> From the above it looks like that I've to donwload Hadoop or CDH4 first in order to use Spark ? I've a standalone cluster and my data size is also like hundreds of Gig or close to Terabyte. I don't get it that which one I need to download from the above list. Could some one assist me that which one I need to download for standalone cluster and for big data foot print ? or Hadoop is needed or mandatory for using Spark? that's not the understanding I've. My understanding is that you can use spark with Hadoop if you like from yarn2 but you could use spark standalone also without hadoop. Please assist. I'm confused ! -Sanjeev --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]> -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: [email protected]<mailto:[email protected]> W: www.velos.io<http://www.velos.io>
