It worked for me: a=[] for i in range(0,10000): a.append(i)
def f(iterator): yield sum(1 for _ in iterator) print sc.parallelize(a, 16).mapPartitions(lambda x: f(x)).collect() 13/09/27 17:58:26 INFO spark.SparkContext: Job finished: collect at NativeMethodAccessorImpl.java:-2, took 0.172441 s [625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625] -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Thu, Sep 26, 2013 at 10:08 PM, Shangyu Luo <[email protected]> wrote: > I can see the test for ParallelCollectionRDD.slice(). > But how to explain the result of my test? > The following is the simple code I used for test > a=[] > for i in range(0,10000): > a.append(i) > print sc.parallelize(a, 16).mapPartitions(lambda x: f(x)).collect() > and the result is [0, 523776, 0, 1572352, 2620928, 0, 3669504, 4718080, 0, > 5766656, 0, 6815232, 7863808, 0, 8912384, 7532280] > > > 2013/9/26 Mike <[email protected]> > >> > It does in fact attempt to do that, and the tests check for that, but >> > there's no guarantee in its API. Of course "equally" here means +/- >> > one element. >> >> Correction: *except* when the Seq is a NumericRange: then it shorts the >> last partition. E.g., 91 elements split 10 ways -> 10 elements in the >> first 9 partitions, one in the last. >> > > > > -- > -- > > Shangyu, Luo > Department of Computer Science > Rice University > > -- > Not Just Think About It, But Do It! > -- > Success is never final. > -- > Losers always whine about their best >
