Re: Spark v Redshift

Matei Zaharia Tue, 04 Nov 2014 15:55:06 -0800

BTW while I haven't actually used Redshift, I've seen many companies that use 
both, usually using Spark for ETL and advanced analytics and Redshift for SQL 
on the cleaned / summarized data. Xiangrui Meng also wrote 
https://github.com/mengxr/redshift-input-format to make it easy to read data 
exported from Redshift into Spark or Hadoop.


Matei

> On Nov 4, 2014, at 3:51 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> 
> Is this about Spark SQL vs Redshift, or Spark in general? Spark in general 
> provides a broader set of capabilities than Redshift because it has APIs in 
> general-purpose languages (Java, Scala, Python) and libraries for things like 
> machine learning and graph processing. For example, you might use Spark to do 
> the ETL that will put data into a database such as Redshift, or you might 
> pull data out of Redshift into Spark for machine learning. On the other hand, 
> if *all* you want to do is SQL and you are okay with the set of data formats 
> and features in Redshift (i.e. you can express everything using its UDFs and 
> you have a way to get data in), then Redshift is a complete service which 
> will do more management out of the box.
> 
> Matei
> 
>> On Nov 4, 2014, at 3:11 PM, agfung <agf...@gmail.com> wrote:
>> 
>> I'm in the midst of a heated debate about the use of Redshift v Spark with a
>> colleague.  We keep trading anecdotes and links back and forth (eg airbnb
>> post from 2013 or amplab benchmarks), and we don't seem to be getting
>> anywhere. 
>> 
>> So before we start down the prototype /benchmark road, and in desperation 
>> of finding *some* kind of objective third party perspective,  was wondering
>> if anyone who has used both in 2014 would care to provide commentary about
>> the sweet spot use cases / gotchas for non trivial use (eg a simple filter
>> scan isn't really interesting).  Soft issues like operational maintenance
>> and time spent developing v out of the box are interesting too... 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-tp18112.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark v Redshift

Reply via email to