My fellow welders <https://www.google.com/search?q=welding+sparks&tbm=isch>,
(Can we make that a thing? Let's make that a thing. :) I'm trying to wedge Spark into an existing model where we process and transform some data and then load it into an MPP database. I know that part of the sell of Spark and Shark is that you shouldn't have to copy data around like this, so please bear with me. :) Say I have an RDD of about 10GB in size that's cached in memory. What is the best/fastest way to push that data into an MPP database like Redshift<http://aws.amazon.com/redshift/>? Has anyone done something like this? I'm assuming that pushing the data straight from memory into the database is much faster than writing the RDD to HDFS and then COPY-ing it from there into the database. Is there, for example, a way to perform a bulk load into the database that runs on each partition of the in-memory RDD in parallel? Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/best-practices-for-pushing-an-RDD-into-a-database-tp2681.html Sent from the Apache Spark User List mailing list archive at Nabble.com.