Re: Batch of updates

2014-10-29 Thread Sean Owen
I don't think accumulators come into play here. Use foreachPartition, not mapPartitions. On Wed, Oct 29, 2014 at 12:43 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Sorry but I wasn't able to code my stuff using accumulators as you suggested :( In my use case I have to to add elements to

Re: Batch of updates

2014-10-28 Thread Kamal Banga
Hi Flavio, Doing batch += ... shouldn't work. It will create new batch for each element in the myRDD (also val initializes an immutable variable, var is for mutable variables). You can use something like accumulators http://spark.apache.org/docs/latest/programming-guide.html#accumulators. val

Re: Batch of updates

2014-10-28 Thread Sean Owen
You should use foreachPartition, and take care to open and close your connection following the pattern described in: http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E Within a partition, you iterate over

Re: Batch of updates

2014-10-28 Thread Flavio Pompermaier
Sorry but I wasn't able to code my stuff using accumulators as you suggested :( In my use case I have to to add elements to an array/list and then, every 100 element commit the batch to a solr index and then clear it. In the cleanup code I have to commit the uncommited (remainder) elements. In