Sean, would this work --

rdd.mapPartitions { partition => Iterator(partition) }.foreach(

   // Some setup code here
   // save partition to DB
   // Some cleanup code here
)


I tried a pretty simple example ... I can see that the setup and
cleanup are executed on the executor node, once per partition (I used
mapPartitionWithIndex instead of mapPartition to track this a little
better). Seems like an easier solution than Tobias's but I'm wondering
if it's perhaps incorrect




On Mon, Aug 18, 2014 at 3:29 AM, Henry Hung <ythu...@winbond.com> wrote:

> I slightly modify the code to use while(partitions.hasNext) { } instead of
> partitions.map(func)
> I suppose this can eliminate the uncertainty from lazy execution.
>
> -----Original Message-----
> From: Sean Owen [mailto:so...@cloudera.com]
> Sent: Monday, August 18, 2014 3:10 PM
> To: MA33 YTHung1
> Cc: user@spark.apache.org
> Subject: Re: a noob question for how to implement setup and cleanup in
> Spark map
>
> I think this was a more comprehensive answer recently. Tobias is right
> that it is not quite that simple:
>
> http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E
>
> On Mon, Aug 18, 2014 at 8:04 AM, Henry Hung <ythu...@winbond.com> wrote:
> > Hi All,
> >
> >
> >
> > Please ignore my question, I found a way to implement it via old
> > archive
> > mails:
> >
> >
> >
> > http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAF
> > _KkPzpU4qZWzDWUpS5r9bbh=-hwnze2qqg56e25p--1wv...@mail.gmail.com%3E
> >
> >
> >
> > Best regards,
> >
> > Henry
> >
> >
> >
> > From: MA33 YTHung1
> > Sent: Monday, August 18, 2014 2:42 PM
> > To: user@spark.apache.org
> > Subject: a noob question for how to implement setup and cleanup in
> > Spark map
> >
> >
> >
> > Hi All,
> >
> >
> >
> > I’m new to Spark and Scala, just recently using this language and love
> > it, but there is a small coding problem when I want to convert my
> > existing map reduce code from Java to Spark…
> >
> >
> >
> > In Java, I create a class by extending
> > org.apache.hadoop.mapreduce.Mapper
> > and override the setup(), map() and cleanup() methods.
> >
> > But in the Spark, there is no a method called setup(), so I write the
> > setup() code into map(), but it performs badly.
> >
> > The reason is I create database connection in the setup() once and
> > run() will execute SQL query, then cleanup() will close the connection.
> >
> > Could someone tell me how to do it in Spark?
> >
> >
> >
> > Best regards,
> >
> > Henry Hung
> >
> >
> >
> > ________________________________
> >
> > The privileged confidential information contained in this email is
> > intended for use only by the addressees as indicated by the original
> > sender of this email. If you are not the addressee indicated in this
> > email or are not responsible for delivery of the email to such a
> > person, please kindly reply to the sender indicating this fact and
> > delete all copies of it from your computer and network server
> > immediately. Your cooperation is highly appreciated. It is advised
> > that any unauthorized use of confidential information of Winbond is
> > strictly prohibited; and any information in this email irrelevant to
> > the official business of Winbond shall be deemed as neither given nor
> endorsed by Winbond.
> >
> >
> > ________________________________
> > The privileged confidential information contained in this email is
> > intended for use only by the addressees as indicated by the original
> > sender of this email. If you are not the addressee indicated in this
> > email or are not responsible for delivery of the email to such a
> > person, please kindly reply to the sender indicating this fact and
> > delete all copies of it from your computer and network server
> > immediately. Your cooperation is highly appreciated. It is advised
> > that any unauthorized use of confidential information of Winbond is
> > strictly prohibited; and any information in this email irrelevant to
> > the official business of Winbond shall be deemed as neither given nor
> endorsed by Winbond.
>
> The privileged confidential information contained in this email is
> intended for use only by the addressees as indicated by the original sender
> of this email. If you are not the addressee indicated in this email or are
> not responsible for delivery of the email to such a person, please kindly
> reply to the sender indicating this fact and delete all copies of it from
> your computer and network server immediately. Your cooperation is highly
> appreciated. It is advised that any unauthorized use of confidential
> information of Winbond is strictly prohibited; and any information in this
> email irrelevant to the official business of Winbond shall be deemed as
> neither given nor endorsed by Winbond.
>

Reply via email to