Re: Spark Job not using all nodes in cluster
No. I am not setting the number of executors anywhere (in env file or in program). Is it due to large number of small files ? On Wed, May 20, 2015 at 5:11 PM, ayan guha guha.a...@gmail.com wrote: What is your spark env file says? Are you setting number of executors in spark context? On 20 May 2015 13:16, Shailesh Birari sbirar...@gmail.com wrote: Hi, I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB of RAM. I have around 600,000+ Json files on HDFS. Each file is small around 1KB in size. Total data is around 16GB. Hadoop block size is 256MB. My application reads these files with sc.textFile() (or sc.jsonFile() tried both) API. But all the files are getting read by only one node (4 executors). Spark UI shows all 600K+ tasks on one node and 0 on other nodes. I confirmed that all files are accessible from all nodes. Some other application which uses big files uses all nodes on same cluster. Can you please let me know why it is behaving in such way ? Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Job not using all nodes in cluster
What is your spark env file says? Are you setting number of executors in spark context? On 20 May 2015 13:16, Shailesh Birari sbirar...@gmail.com wrote: Hi, I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB of RAM. I have around 600,000+ Json files on HDFS. Each file is small around 1KB in size. Total data is around 16GB. Hadoop block size is 256MB. My application reads these files with sc.textFile() (or sc.jsonFile() tried both) API. But all the files are getting read by only one node (4 executors). Spark UI shows all 600K+ tasks on one node and 0 on other nodes. I confirmed that all files are accessible from all nodes. Some other application which uses big files uses all nodes on same cluster. Can you please let me know why it is behaving in such way ? Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Job not using all nodes in cluster
Hi, I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB of RAM. I have around 600,000+ Json files on HDFS. Each file is small around 1KB in size. Total data is around 16GB. Hadoop block size is 256MB. My application reads these files with sc.textFile() (or sc.jsonFile() tried both) API. But all the files are getting read by only one node (4 executors). Spark UI shows all 600K+ tasks on one node and 0 on other nodes. I confirmed that all files are accessible from all nodes. Some other application which uses big files uses all nodes on same cluster. Can you please let me know why it is behaving in such way ? Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org