Dell - Internal Use - Confidential Yes I unioned four RDDs of 32 partitions each.
Thank you, Hussam From: Matei Zaharia [mailto:[email protected]] Sent: Wednesday, November 13, 2013 10:37 PM To: [email protected] Subject: Re: interesting finding per using union Union just puts the data in two RDDs together, so you get an RDD containing the elements of both, and with the partitions that would've been in both. It's not a unique set union (that would be union() then distinct()). Here you've unioned four RDDs of 32 partitions each to get 128. If you want to have fewer partitions in the final RDD, but do want to include all that data together, you can call coalesce() after unioning them. Matei On Nov 13, 2013, at 6:33 PM, [email protected]<mailto:[email protected]> wrote: Hi, I am creating initial javaRDD with partition 32 then loop per my data and union with initial javaRDD I have as follows JavaRDD<String> dataSetRDD = null; JavaRDD<String> unionDataSetRDD = null; For (..) { If (0 == i) { unionDataSetRDD = SparkDriver.getSparkContext().parallelize(finalresult, 32); } else { dataSetRDD = SparkDriver.getSparkContext().parallelize(finalresult, 32); unionDataSetRDD = unionDataSetRDD.union(dataSetRDD); } } //for System.out.println("unionDataSetRDD: " + unionDataSetRDD.toDebugString()); Output unionDataSetRDD: UnionRDD[6] at union at DatasetServiceImpl.java:174 (128 partitions) UnionRDD[4] at union at DatasetServiceImpl.java:174 (96 partitions) UnionRDD[2] at union at DatasetServiceImpl.java:174 (64 partitions) ParallelCollectionRDD[0] at parallelize at DatasetServiceImpl.java:167 (32 partitions) ParallelCollectionRDD[1] at parallelize at DatasetServiceImpl.java:172 (32 partitions) ParallelCollectionRDD[3] at parallelize at DatasetServiceImpl.java:172 (32 partitions) ParallelCollectionRDD[5] at parallelize at DatasetServiceImpl.java:172 (32 partitions) The interesting is my final unionDataSetRDD endup with (128 partitions). I thought it keep the 32 partitions as I explicitly set in parallelize Does above make sense? Thanks, Hussam
