Looks like reduceByKey() should work here. Cheers <k/> On Sat, Jul 11, 2015 at 11:02 AM, leonida.gianfagna < leonida.gianfa...@gmail.com> wrote:
> Thanks a lot oubrik, > > I got your point, my consideration is that sum() should be already a > built-in function for iterators in python. > Anyway I tried your approach > > def mysum(iter): > count = sum = 0 > for item in iter: > count += 1 > sum += item > return sum > wordCountsGrouped = wordsGrouped.groupByKey().map(lambda > (w,iterator):(w,mysum(iterator))) > print wordCountsGrouped.collect() > > but i get the error below, any idea? > > TypeError: unsupported operand type(s) for +=: 'int' and 'ResultIterable' > > at > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:135) > at > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176) > at > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > thx > Leonida > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Sum-elements-of-an-iterator-inside-an-RDD-tp23775p23778.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >