Hello Sparkies !

Could anyone please answer this? This is not an Hadoop cluster, so which 
download option should I use to download for standalone cluster ?

Also what are the best practices if you’ve 1TB of data and want to use spark ? 
Do you’ve to use Hadoop/CDH or some other option ?

Appreciate it.

From: Sagar, Sanjeev [mailto:[email protected]]
Sent: Thursday, August 28, 2014 2:44 PM
To: Daniel Siegmann
Cc: [email protected]
Subject: RE: Q on downloading spark for standalone cluster

Hello Daniel, If you’re not using Hadoop then why you want to grab the Hadoop 
package? CDH5 will download all the Hadoop packages and cloudera manager too.

Just curious what happen if you start spark on EC2 cluster, what it choose for 
the data store as default?

-Sanjeev

From: Daniel Siegmann [mailto:[email protected]]
Sent: Thursday, August 28, 2014 2:04 PM
To: Sagar, Sanjeev
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Q on downloading spark for standalone cluster

If you aren't using Hadoop, I don't think it matters which you download. I'd 
probably just grab the Hadoop 2 package.
Out of curiosity, what are you using as your data store? I get the impression 
most Spark users are using HDFS or something built on top.

On Thu, Aug 28, 2014 at 4:07 PM, Sanjeev Sagar 
<[email protected]<mailto:[email protected]>> wrote:
Hello there,

I've a basic question on the downloadthat which option I need to downloadfor 
standalone cluster.

I've a private cluster of three machineson Centos. When I click on download it 
shows me following:


   Download Spark

The latest release is Spark 1.0.2, released August 5, 2014 (release notes) 
<http://spark.apache.org/releases/spark-release-1-0-2.html> (git tag) 
<https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f>

Pre-built packages:

 * For Hadoop 1 (HDP1, CDH3): find an Apache mirror
   
<http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz>
 * For CDH4: find an Apache mirror
   
<http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz>
 * For Hadoop 2 (HDP2, CDH5): find an Apache mirror
   
<http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz>

Pre-built packages, third-party (NOTE: may include non ASF-compatible licenses):

 * For MapRv3: direct file download (external)
   <http://package.mapr.com/tools/apache-spark/1.0.2/spark-1.0.2-bin-mapr3.tgz>
 * For MapRv4: direct file download (external)
   <http://package.mapr.com/tools/apache-spark/1.0.2/spark-1.0.2-bin-mapr4.tgz>


From the above it looks like that I've to donwload Hadoop or CDH4 first in 
order to use Spark ? I've a standalone cluster and my data size is also like 
hundreds of Gig or close to Terabyte.

I don't get it that which one I need to download from the above list.

Could some one assist me that which one I need to download for standalone 
cluster and for big data foot print ?

or Hadoop is needed or mandatory for using Spark? that's not the understanding 
I've. My understanding is that you can use spark with Hadoop if you like from 
yarn2 but you could use spark standalone also without hadoop.

Please assist. I'm confused !

-Sanjeev


---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>



--
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [email protected]<mailto:[email protected]> W: 
www.velos.io<http://www.velos.io>

Reply via email to