Hi, I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB of RAM. I have around 600,000+ Json files on HDFS. Each file is small around 1KB in size. Total data is around 16GB. Hadoop block size is 256MB. My application reads these files with sc.textFile() (or sc.jsonFile() tried both) API. But all the files are getting read by only one node (4 executors). Spark UI shows all 600K+ tasks on one node and 0 on other nodes.
I confirmed that all files are accessible from all nodes. Some other application which uses big files uses all nodes on same cluster. Can you please let me know why it is behaving in such way ? Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org