Hi lihu, Maybe the data you're accessing is in in HDFS and only resides on 4 of your 20 machines because it's only about 4 blocks (at default 64MB / block that's around a quarter GB). Where is your source data located and how is it stored?
Andrew On Thu, Jan 2, 2014 at 7:53 AM, lihu <[email protected]> wrote: > Hi, > I run spark on a cluster with 20 machine, but when I start an > application use the spark-shell, there only 4 machine is working , the > other with just idle, without memery and cpu used, I watch this through > webui. > > I wonder the other machine maybe busy, so i watch the machines using > "top" and "free" command, but this is not。 > > * So I just wonder why not spark assignment work to all all the 20 > machine? this is not a good resource usage.* > > > > >
