Hi Jia,
I think the examples you provided is not very suitable to illustrate what
driver and executors do, because it's not show the internal implementation
of the KMeans algorithm.
You can refer the source code of MLlib Kmeans (
Thanks, Yanbo.
The results become much more reasonable, after I set driver memory to 5GB
and increase worker memory to 25GB.
So, my question is for following code snippet extracted from main method in
JavaKMeans.java in examples, what will the driver do? and what will the
worker do?
I didn't
Hi Jia,
You can try to use inputRDD.persist(MEMORY_AND_DISK) and verify whether it
can produce stable performance. The storage level of MEMORY_AND_DISK will
store the partitions that don't fit on disk and read them from there when
they are needed.
Actually, it's not necessary to set so large
I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU
cores and 30GB memory. Executor memory is set to 15GB, and driver memory is
set to 15GB.
The observation is that, when input data size is smaller than 15GB, the
performance is quite stable. However, when input data becomes