Hi to all,
I'm trying to convert my old mapreduce job to a spark one but I have some
doubts..
My application basically buffers a batch of updates and every 100 elements
it flushes the batch to a server. This is very easy in mapreduce but I
don't know how you can do that in scala..

For example, if I do:

myRdd.map(x => {
        val batch = new ArrayBuffer[someClass]()
        batch += someClassFactory.fromString(x._2, x._3)
        if (batch.size % 100 == 0) {
          println("Flush legacy entities");
          batch.clear
        }

I still have two problems:

 - how can I commit the remainder elements (in mapreduce those elements
still in the batch array within cleanup() method)?
- if I have to create a connection to a server for pushing updates, is it
better to use mapPartitions instead of map?

Best,
Flavio

Reply via email to