Hi to all, I'm trying to convert my old mapreduce job to a spark one but I have some doubts.. My application basically buffers a batch of updates and every 100 elements it flushes the batch to a server. This is very easy in mapreduce but I don't know how you can do that in scala..
For example, if I do: myRdd.map(x => { val batch = new ArrayBuffer[someClass]() batch += someClassFactory.fromString(x._2, x._3) if (batch.size % 100 == 0) { println("Flush legacy entities"); batch.clear } I still have two problems: - how can I commit the remainder elements (in mapreduce those elements still in the batch array within cleanup() method)? - if I have to create a connection to a server for pushing updates, is it better to use mapPartitions instead of map? Best, Flavio