Hi,

JavaRDD<Integer> distData = sc.parallelize(data);

On what basis parallelize splits the data into multiple datasets. How to handle 
if we want these many datasets to be executed per executor?

For example, my data is of 1000 integers list and I am having 2 node yarn 
cluster. It is diving into 2 batches of 500 size.

Regards,
Naveen.

Reply via email to