Yes,
I also tested it with an array with 100 elements and 10 partitions and the
result was all 0 but one which was the sum.


2013/9/26 Horia <[email protected]>

> Silly question: does sc.parallelize guarantee the allocation of the items
> to always be distributed equally across the partitions?
>
> It seems to me that, in the example above, all four items were assigned to
> the same partition. Have you tried the same with many more items?
> On Sep 26, 2013 9:01 PM, "Shangyu Luo" <[email protected]> wrote:
>
>> Hi,
>> I am trying to test mapPartitions function in Spark Python version, but I
>> got wrong result.
>> More specifically, in pyspark shell:
>> >>> rdd = sc.parallelize([1, 2, 3, 4], 2)
>> >>> def f(iterator): yield sum(iterator)
>> ...
>> >>> rdd.mapPartitions(f).collect()
>> The result is [0, 10], not [3, 7]
>> Is there anything wrong with my code?
>> Thanks!
>>
>>
>> --
>> --
>>
>> Shangyu, Luo
>> Department of Computer Science
>> Rice University
>>
>>


-- 
--

Shangyu, Luo
Department of Computer Science
Rice University

--
Not Just Think About It, But Do It!
--
Success is never final.
--
Losers always whine about their best

Reply via email to