OK. Thank you all. Currently I use 0.7.3.
2013/9/28 Matei Zaharia <[email protected]> > This was actually a bug in the parallelize() version for Python that > should be fixed in Spark 0.8. It may also be fixed in 0.7.3. > > Matei > > On Sep 27, 2013, at 8:59 PM, Reynold Xin <[email protected]> wrote: > > It worked for me: > > a=[] > for i in range(0,10000): > a.append(i) > > def f(iterator): yield sum(1 for _ in iterator) > > print sc.parallelize(a, 16).mapPartitions(lambda x: f(x)).collect() > > > 13/09/27 17:58:26 INFO spark.SparkContext: Job finished: collect at > NativeMethodAccessorImpl.java:-2, took 0.172441 s > [625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, > 625, 625] > > > > -- > Reynold Xin, AMPLab, UC Berkeley > http://rxin.org > > > > On Thu, Sep 26, 2013 at 10:08 PM, Shangyu Luo <[email protected]> wrote: > >> I can see the test for ParallelCollectionRDD.slice(). >> But how to explain the result of my test? >> The following is the simple code I used for test >> a=[] >> for i in range(0,10000): >> a.append(i) >> print sc.parallelize(a, 16).mapPartitions(lambda x: f(x)).collect() >> and the result is [0, 523776, 0, 1572352, 2620928, 0, 3669504, 4718080, >> 0, 5766656, 0, 6815232, 7863808, 0, 8912384, 7532280] >> >> >> 2013/9/26 Mike <[email protected]> >> >>> > It does in fact attempt to do that, and the tests check for that, but >>> > there's no guarantee in its API. Of course "equally" here means +/- >>> > one element. >>> >>> Correction: *except* when the Seq is a NumericRange: then it shorts the >>> last partition. E.g., 91 elements split 10 ways -> 10 elements in the >>> first 9 partitions, one in the last. >>> >> >> >> >> -- >> -- >> >> Shangyu, Luo >> Department of Computer Science >> Rice University >> >> -- >> Not Just Think About It, But Do It! >> -- >> Success is never final. >> -- >> Losers always whine about their best >> > > > -- -- Shangyu, Luo Department of Computer Science Rice University -- Not Just Think About It, But Do It! -- Success is never final. -- Losers always whine about their best
