Re: Spark Job not using all nodes in cluster

2015-05-20 Thread Shailesh Birari
No. I am not setting the number of executors anywhere (in env file or in
program).

Is it due to large number of small files ?

On Wed, May 20, 2015 at 5:11 PM, ayan guha guha.a...@gmail.com wrote:

 What is your spark env file says? Are you setting number of executors in
 spark context?
 On 20 May 2015 13:16, Shailesh Birari sbirar...@gmail.com wrote:

 Hi,

 I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
 of RAM.
 I have around 600,000+ Json files on HDFS. Each file is small around 1KB
 in
 size. Total data is around 16GB. Hadoop block size is 256MB.
 My application reads these files with sc.textFile() (or sc.jsonFile()
 tried
 both) API. But all the files are getting read by only one node (4
 executors). Spark UI shows all 600K+ tasks on one node and 0 on other
 nodes.

 I confirmed that all files are accessible from all nodes. Some other
 application which uses big files uses all nodes on same cluster.

 Can you please let me know why it is behaving in such way ?

 Thanks,
   Shailesh




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark Job not using all nodes in cluster

2015-05-19 Thread ayan guha
What is your spark env file says? Are you setting number of executors in
spark context?
On 20 May 2015 13:16, Shailesh Birari sbirar...@gmail.com wrote:

 Hi,

 I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
 of RAM.
 I have around 600,000+ Json files on HDFS. Each file is small around 1KB in
 size. Total data is around 16GB. Hadoop block size is 256MB.
 My application reads these files with sc.textFile() (or sc.jsonFile()
 tried
 both) API. But all the files are getting read by only one node (4
 executors). Spark UI shows all 600K+ tasks on one node and 0 on other
 nodes.

 I confirmed that all files are accessible from all nodes. Some other
 application which uses big files uses all nodes on same cluster.

 Can you please let me know why it is behaving in such way ?

 Thanks,
   Shailesh




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Spark Job not using all nodes in cluster

2015-05-19 Thread Shailesh Birari
Hi,

I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
of RAM.
I have around 600,000+ Json files on HDFS. Each file is small around 1KB in
size. Total data is around 16GB. Hadoop block size is 256MB.
My application reads these files with sc.textFile() (or sc.jsonFile()  tried
both) API. But all the files are getting read by only one node (4
executors). Spark UI shows all 600K+ tasks on one node and 0 on other nodes.

I confirmed that all files are accessible from all nodes. Some other
application which uses big files uses all nodes on same cluster.

Can you please let me know why it is behaving in such way ?

Thanks,
  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-not-using-all-nodes-in-cluster-tp22951.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org