Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-20 Thread Victor Tso-Guillen
How about this: val prev: RDD[V] = rdd.mapPartitions(partition = { /*setup()*/; partition }) new RDD[V](prev) { protected def getPartitions = prev.partitions def compute(split: Partition, context: TaskContext) = { context.addOnCompleteCallback(() = /*cleanup()*/)

Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-20 Thread Victor Tso-Guillen
And duh, of course, you can do the setup in that new RDD as well :) On Wed, Aug 20, 2014 at 1:59 AM, Victor Tso-Guillen v...@paxata.com wrote: How about this: val prev: RDD[V] = rdd.mapPartitions(partition = { /*setup()*/; partition }) new RDD[V](prev) { protected def getPartitions =

Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-19 Thread Yana Kadiyska
Sean, would this work -- rdd.mapPartitions { partition = Iterator(partition) }.foreach( // Some setup code here // save partition to DB // Some cleanup code here ) I tried a pretty simple example ... I can see that the setup and cleanup are executed on the executor node, once per

Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-19 Thread Sean Owen
I think you're looking for foreachPartition(). You've kinda hacked it out of mapPartitions(). Your case has a simple solution, yes. After saving to the DB, you know you can close the connection, since you know the use of the connection has definitely just finished. But it's not a simpler solution

a noob question for how to implement setup and cleanup in Spark map

2014-08-18 Thread Henry Hung
Hi All, I'm new to Spark and Scala, just recently using this language and love it, but there is a small coding problem when I want to convert my existing map reduce code from Java to Spark... In Java, I create a class by extending org.apache.hadoop.mapreduce.Mapper and override the setup(),

RE: a noob question for how to implement setup and cleanup in Spark map

2014-08-18 Thread Henry Hung
Hi All, Please ignore my question, I found a way to implement it via old archive mails: http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAF_KkPzpU4qZWzDWUpS5r9bbh=-hwnze2qqg56e25p--1wv...@mail.gmail.com%3E Best regards, Henry From: MA33 YTHung1 Sent: Monday, August 18, 2014

Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-18 Thread Akhil Das
You can create an RDD and then you can do a map or mapPartitions on that where in the top you will create the database connection and all, then do the operations and at the end close the connections. Thanks Best Regards On Mon, Aug 18, 2014 at 12:34 PM, Henry Hung ythu...@winbond.com wrote:

Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-18 Thread Sean Owen
I think this was a more comprehensive answer recently. Tobias is right that it is not quite that simple: http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E On Mon, Aug 18, 2014 at 8:04 AM, Henry Hung

RE: a noob question for how to implement setup and cleanup in Spark map

2014-08-18 Thread Henry Hung
I slightly modify the code to use while(partitions.hasNext) { } instead of partitions.map(func) I suppose this can eliminate the uncertainty from lazy execution. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, August 18, 2014 3:10 PM To: MA33 YTHung1 Cc: