But you might run into performance issue. I don't know the subject about
Spark but with Hadoop MapReduce, Sqoop might be a solution in order to
handle with care the database

Bertrand Dechoux


On Fri, Mar 14, 2014 at 4:47 AM, Christopher Nguyen <c...@adatao.com> wrote:

> Nicholas,
>
> > (Can we make that a thing? Let's make that a thing. :)
>
> Yes, we're soon releasing something called Distributed DataFrame (DDF) to
> the community that will make this (among other useful idioms) "a
> (straightforward) thing" for Spark.
>
> Sent while mobile. Pls excuse typos etc.
> On Mar 13, 2014 2:05 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com>
> wrote:
>
>> My fellow welders<https://www.google.com/search?q=welding+sparks&tbm=isch>
>> ,
>>
>> (Can we make that a thing? Let's make that a thing. :)
>>
>> I'm trying to wedge Spark into an existing model where we process and
>> transform some data and then load it into an MPP database. I know that part
>> of the sell of Spark and Shark is that you shouldn't have to copy data
>> around like this, so please bear with me. :)
>>
>> Say I have an RDD of about 10GB in size that's cached in memory. What is
>> the best/fastest way to push that data into an MPP database like 
>> Redshift<http://aws.amazon.com/redshift/>?
>> Has anyone done something like this?
>>
>> I'm assuming that pushing the data straight from memory into the database
>> is much faster than writing the RDD to HDFS and then COPY-ing it from there
>> into the database.
>>
>> Is there, for example, a way to perform a bulk load into the database
>> that runs on each partition of the in-memory RDD in parallel?
>>
>> Nick
>>
>>
>> ------------------------------
>> View this message in context: best practices for pushing an RDD into a
>> database<http://apache-spark-user-list.1001560.n3.nabble.com/best-practices-for-pushing-an-RDD-into-a-database-tp2681.html>
>> Sent from the Apache Spark User List mailing list 
>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>
>

Reply via email to