It's hard to tell. I have not run this on EC2 but this worked for me: The only thing that I can think of is that the scheduling mode is set to
- *Scheduling Mode:* FAIR val pool: ExecutorService = Executors.newFixedThreadPool(poolSize) while_loop to get curr_job pool.execute(new ReportJob(sqlContext, curr_job, i)) class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: String,id:Int) extends Runnable with Logging { def threadId = (Thread.currentThread.getName() + "\t") def run() { logInfo(s"********************* Running ${threadId} ${id}") val startTime = Platform.currentTime val hiveQuery=query val result_set = sqlContext.sql(hiveQuery) result_set.repartition(1) result_set.saveAsParquetFile(s"hdfs:///tmp/${id}") logInfo(s"********************* DONE ${threadId} ${id} time: "+(Platform.currentTime-startTime)) } } On Tue, Feb 24, 2015 at 4:04 AM, Harika <matha.har...@gmail.com> wrote: > Hi all, > > I have been running a simple SQL program on Spark. To test the concurrency, > I have created 10 threads inside the program, all threads using same > SQLContext object. When I ran the program on my EC2 cluster using > spark-submit, only 3 threads were running in parallel. I have repeated the > test on different EC2 clusters (containing different number of cores) and > found out that only 3 threads are running in parallel on every cluster. > > Why is this behaviour seen? What does this number 3 specify? > Is there any configuration parameter that I have to set if I want to run > more threads concurrently? > > Thanks > Harika > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >