But you might run into performance issue. I don't know the subject about Spark but with Hadoop MapReduce, Sqoop might be a solution in order to handle with care the database
Bertrand Dechoux On Fri, Mar 14, 2014 at 4:47 AM, Christopher Nguyen <c...@adatao.com> wrote: > Nicholas, > > > (Can we make that a thing? Let's make that a thing. :) > > Yes, we're soon releasing something called Distributed DataFrame (DDF) to > the community that will make this (among other useful idioms) "a > (straightforward) thing" for Spark. > > Sent while mobile. Pls excuse typos etc. > On Mar 13, 2014 2:05 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com> > wrote: > >> My fellow welders<https://www.google.com/search?q=welding+sparks&tbm=isch> >> , >> >> (Can we make that a thing? Let's make that a thing. :) >> >> I'm trying to wedge Spark into an existing model where we process and >> transform some data and then load it into an MPP database. I know that part >> of the sell of Spark and Shark is that you shouldn't have to copy data >> around like this, so please bear with me. :) >> >> Say I have an RDD of about 10GB in size that's cached in memory. What is >> the best/fastest way to push that data into an MPP database like >> Redshift<http://aws.amazon.com/redshift/>? >> Has anyone done something like this? >> >> I'm assuming that pushing the data straight from memory into the database >> is much faster than writing the RDD to HDFS and then COPY-ing it from there >> into the database. >> >> Is there, for example, a way to perform a bulk load into the database >> that runs on each partition of the in-memory RDD in parallel? >> >> Nick >> >> >> ------------------------------ >> View this message in context: best practices for pushing an RDD into a >> database<http://apache-spark-user-list.1001560.n3.nabble.com/best-practices-for-pushing-an-RDD-into-a-database-tp2681.html> >> Sent from the Apache Spark User List mailing list >> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >> >