Hi, JavaRDD<Integer> distData = sc.parallelize(data);
On what basis parallelize splits the data into multiple datasets. How to handle if we want these many datasets to be executed per executor? For example, my data is of 1000 integers list and I am having 2 node yarn cluster. It is diving into 2 batches of 500 size. Regards, Naveen.