Re: Running multiple threads with same Spark Context
I am not sure if your issue is setting the Fair mode correctly or something else so let's start with the FAIR mode. Do you see scheduler mode actually being set to FAIR: I have this line in spark-defaults.conf spark.scheduler.allocation.file=/spark/conf/fairscheduler.xml Then, when I start my application, I can see that it is using that scheduler in the UI -- go to master UI and click on your application. Then you should see this (note the scheduling mode is shown as Fair): On Wed, Feb 25, 2015 at 4:06 AM, Harika Matha wrote: > Hi Yana, > > I tried running the program after setting the property > "spark.scheduler.mode" to FAIR. But the result is same as previous. Are > there any other properties that have to be set? > > > On Tue, Feb 24, 2015 at 10:26 PM, Yana Kadiyska > wrote: > >> It's hard to tell. I have not run this on EC2 but this worked for me: >> >> The only thing that I can think of is that the scheduling mode is set to >> >>- *Scheduling Mode:* FAIR >> >> >> val pool: ExecutorService = Executors.newFixedThreadPool(poolSize) >> while_loop to get curr_job >> pool.execute(new ReportJob(sqlContext, curr_job, i)) >> >> class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: >> String,id:Int) extends Runnable with Logging { >> def threadId = (Thread.currentThread.getName() + "\t") >> >> def run() { >> logInfo(s"* Running ${threadId} ${id}") >> val startTime = Platform.currentTime >> val hiveQuery=query >> val result_set = sqlContext.sql(hiveQuery) >> result_set.repartition(1) >> result_set.saveAsParquetFile(s"hdfs:///tmp/${id}") >> logInfo(s"* DONE ${threadId} ${id} time: >> "+(Platform.currentTime-startTime)) >> } >> } >> >> >> >> On Tue, Feb 24, 2015 at 4:04 AM, Harika wrote: >> >>> Hi all, >>> >>> I have been running a simple SQL program on Spark. To test the >>> concurrency, >>> I have created 10 threads inside the program, all threads using same >>> SQLContext object. When I ran the program on my EC2 cluster using >>> spark-submit, only 3 threads were running in parallel. I have repeated >>> the >>> test on different EC2 clusters (containing different number of cores) and >>> found out that only 3 threads are running in parallel on every cluster. >>> >>> Why is this behaviour seen? What does this number 3 specify? >>> Is there any configuration parameter that I have to set if I want to run >>> more threads concurrently? >>> >>> Thanks >>> Harika >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Running multiple threads with same Spark Context
Hi Yana, I tried running the program after setting the property "spark.scheduler.mode" to FAIR. But the result is same as previous. Are there any other properties that have to be set? On Tue, Feb 24, 2015 at 10:26 PM, Yana Kadiyska wrote: > It's hard to tell. I have not run this on EC2 but this worked for me: > > The only thing that I can think of is that the scheduling mode is set to > >- *Scheduling Mode:* FAIR > > > val pool: ExecutorService = Executors.newFixedThreadPool(poolSize) > while_loop to get curr_job > pool.execute(new ReportJob(sqlContext, curr_job, i)) > > class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: > String,id:Int) extends Runnable with Logging { > def threadId = (Thread.currentThread.getName() + "\t") > > def run() { > logInfo(s"* Running ${threadId} ${id}") > val startTime = Platform.currentTime > val hiveQuery=query > val result_set = sqlContext.sql(hiveQuery) > result_set.repartition(1) > result_set.saveAsParquetFile(s"hdfs:///tmp/${id}") > logInfo(s"* DONE ${threadId} ${id} time: > "+(Platform.currentTime-startTime)) > } > } > > > > On Tue, Feb 24, 2015 at 4:04 AM, Harika wrote: > >> Hi all, >> >> I have been running a simple SQL program on Spark. To test the >> concurrency, >> I have created 10 threads inside the program, all threads using same >> SQLContext object. When I ran the program on my EC2 cluster using >> spark-submit, only 3 threads were running in parallel. I have repeated the >> test on different EC2 clusters (containing different number of cores) and >> found out that only 3 threads are running in parallel on every cluster. >> >> Why is this behaviour seen? What does this number 3 specify? >> Is there any configuration parameter that I have to set if I want to run >> more threads concurrently? >> >> Thanks >> Harika >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Running multiple threads with same Spark Context
It's hard to tell. I have not run this on EC2 but this worked for me: The only thing that I can think of is that the scheduling mode is set to - *Scheduling Mode:* FAIR val pool: ExecutorService = Executors.newFixedThreadPool(poolSize) while_loop to get curr_job pool.execute(new ReportJob(sqlContext, curr_job, i)) class ReportJob(sqlContext:org.apache.spark.sql.hive.HiveContext,query: String,id:Int) extends Runnable with Logging { def threadId = (Thread.currentThread.getName() + "\t") def run() { logInfo(s"* Running ${threadId} ${id}") val startTime = Platform.currentTime val hiveQuery=query val result_set = sqlContext.sql(hiveQuery) result_set.repartition(1) result_set.saveAsParquetFile(s"hdfs:///tmp/${id}") logInfo(s"* DONE ${threadId} ${id} time: "+(Platform.currentTime-startTime)) } } On Tue, Feb 24, 2015 at 4:04 AM, Harika wrote: > Hi all, > > I have been running a simple SQL program on Spark. To test the concurrency, > I have created 10 threads inside the program, all threads using same > SQLContext object. When I ran the program on my EC2 cluster using > spark-submit, only 3 threads were running in parallel. I have repeated the > test on different EC2 clusters (containing different number of cores) and > found out that only 3 threads are running in parallel on every cluster. > > Why is this behaviour seen? What does this number 3 specify? > Is there any configuration parameter that I have to set if I want to run > more threads concurrently? > > Thanks > Harika > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Running-multiple-threads-with-same-Spark-Context-tp21784.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >