Parallelize on spark context

Naveen Kumar Pokala Thu, 06 Nov 2014 22:44:03 -0800

Hi,

JavaRDD<Integer> distData = sc.parallelize(data);


On what basis parallelize splits the data into multiple datasets. How to handle 
if we want these many datasets to be executed per executor?

For example, my data is of 1000 integers list and I am having 2 node yarn 
cluster. It is diving into 2 batches of 500 size.

Regards,
Naveen.

Parallelize on spark context

Reply via email to