Yes, I also tested it with an array with 100 elements and 10 partitions and the result was all 0 but one which was the sum.
2013/9/26 Horia <[email protected]> > Silly question: does sc.parallelize guarantee the allocation of the items > to always be distributed equally across the partitions? > > It seems to me that, in the example above, all four items were assigned to > the same partition. Have you tried the same with many more items? > On Sep 26, 2013 9:01 PM, "Shangyu Luo" <[email protected]> wrote: > >> Hi, >> I am trying to test mapPartitions function in Spark Python version, but I >> got wrong result. >> More specifically, in pyspark shell: >> >>> rdd = sc.parallelize([1, 2, 3, 4], 2) >> >>> def f(iterator): yield sum(iterator) >> ... >> >>> rdd.mapPartitions(f).collect() >> The result is [0, 10], not [3, 7] >> Is there anything wrong with my code? >> Thanks! >> >> >> -- >> -- >> >> Shangyu, Luo >> Department of Computer Science >> Rice University >> >> -- -- Shangyu, Luo Department of Computer Science Rice University -- Not Just Think About It, But Do It! -- Success is never final. -- Losers always whine about their best
