Hi,

I am creating initial javaRDD with partition 32 then loop per my data and union 
with initial javaRDD I have as follows
JavaRDD<String> dataSetRDD = null;
                    JavaRDD<String> unionDataSetRDD = null;
For (..) {
If (0 == i) {
unionDataSetRDD = SparkDriver.getSparkContext().parallelize(finalresult, 32);
} else {
dataSetRDD = SparkDriver.getSparkContext().parallelize(finalresult, 32);
unionDataSetRDD = unionDataSetRDD.union(dataSetRDD);
}
} //for

System.out.println("unionDataSetRDD: " + unionDataSetRDD.toDebugString());

Output
unionDataSetRDD: UnionRDD[6] at union at DatasetServiceImpl.java:174 (128 
partitions)
  UnionRDD[4] at union at DatasetServiceImpl.java:174 (96 partitions)
    UnionRDD[2] at union at DatasetServiceImpl.java:174 (64 partitions)
      ParallelCollectionRDD[0] at parallelize at DatasetServiceImpl.java:167 
(32 partitions)
      ParallelCollectionRDD[1] at parallelize at DatasetServiceImpl.java:172 
(32 partitions)
    ParallelCollectionRDD[3] at parallelize at DatasetServiceImpl.java:172 (32 
partitions)
  ParallelCollectionRDD[5] at parallelize at DatasetServiceImpl.java:172 (32 
partitions)

The interesting is my final unionDataSetRDD endup with (128 partitions).  I 
thought it keep the 32 partitions as I explicitly set in parallelize

Does above make sense?
Thanks,
Hussam

Reply via email to