BTW while I haven't actually used Redshift, I've seen many companies that use both, usually using Spark for ETL and advanced analytics and Redshift for SQL on the cleaned / summarized data. Xiangrui Meng also wrote https://github.com/mengxr/redshift-input-format to make it easy to read data exported from Redshift into Spark or Hadoop.
Matei > On Nov 4, 2014, at 3:51 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > > Is this about Spark SQL vs Redshift, or Spark in general? Spark in general > provides a broader set of capabilities than Redshift because it has APIs in > general-purpose languages (Java, Scala, Python) and libraries for things like > machine learning and graph processing. For example, you might use Spark to do > the ETL that will put data into a database such as Redshift, or you might > pull data out of Redshift into Spark for machine learning. On the other hand, > if *all* you want to do is SQL and you are okay with the set of data formats > and features in Redshift (i.e. you can express everything using its UDFs and > you have a way to get data in), then Redshift is a complete service which > will do more management out of the box. > > Matei > >> On Nov 4, 2014, at 3:11 PM, agfung <agf...@gmail.com> wrote: >> >> I'm in the midst of a heated debate about the use of Redshift v Spark with a >> colleague. We keep trading anecdotes and links back and forth (eg airbnb >> post from 2013 or amplab benchmarks), and we don't seem to be getting >> anywhere. >> >> So before we start down the prototype /benchmark road, and in desperation >> of finding *some* kind of objective third party perspective, was wondering >> if anyone who has used both in 2014 would care to provide commentary about >> the sweet spot use cases / gotchas for non trivial use (eg a simple filter >> scan isn't really interesting). Soft issues like operational maintenance >> and time spent developing v out of the box are interesting too... >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-tp18112.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org