RE: configure to run multiple tasks on a core
Indeed. That's nice. Thanks! yotto From: Matei Zaharia [matei.zaha...@gmail.com] Sent: Wednesday, November 26, 2014 6:11 PM To: Yotto Koga Cc: Sean Owen; user@spark.apache.org Subject: Re: configure to run multiple tasks on a core Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have one worker that thinks it has more cores. Matei > On Nov 26, 2014, at 5:01 PM, Yotto Koga wrote: > > Thanks Sean. That worked out well. > > For anyone who happens onto this post and wants to do the same, these are the > steps I took to do as Sean suggested... > > (Note this is for a stand alone cluster) > > login to the master > > ~/spark/sbin/stop-all.sh > > edit ~/spark/conf/spark-env.sh > > modify the line > export SPARK_WORKER_INSTANCES=1 > to the multiple you want to set (e.g 2) > > I also added > export SPARK_WORKER_MEMORY=some reasonable value so that the total number of > workers on a node is within the available memory available on the node (e.g. > 2g) > > ~/spark-ec2/copy-dir /root/spark/conf > > ~/spark/sbin/start-all.sh > > > > From: Sean Owen [so...@cloudera.com] > Sent: Wednesday, November 26, 2014 12:14 AM > To: Yotto Koga > Cc: user@spark.apache.org > Subject: Re: configure to run multiple tasks on a core > > What about running, say, 2 executors per machine, each of which thinks > it should use all cores? > > You can also multi-thread your map function manually, directly, within > your code, with careful use of a java.util.concurrent.Executor > > On Wed, Nov 26, 2014 at 6:57 AM, yotto wrote: >> I'm running a spark-ec2 cluster. >> >> I have a map task that calls a specialized C++ external app. The app doesn't >> fully utilize the core as it needs to download/upload data as part of the >> task. Looking at the worker nodes, it appears that there is one task with my >> app running per core. >> >> I'd like to better utilize the cpu resources with the hope of increasing >> throughput by running multiple tasks (with my app) per core in parallel. >> >> I see there is a spark.task.cpus config setting with a default value of 1. >> It appears though that this is used to go the other way than what I am >> looking for. >> >> Is there a way where I can specify multiple tasks per core rather than >> multiple cores per task? >> >> thanks for any help. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: configure to run multiple tasks on a core
Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have one worker that thinks it has more cores. Matei > On Nov 26, 2014, at 5:01 PM, Yotto Koga wrote: > > Thanks Sean. That worked out well. > > For anyone who happens onto this post and wants to do the same, these are the > steps I took to do as Sean suggested... > > (Note this is for a stand alone cluster) > > login to the master > > ~/spark/sbin/stop-all.sh > > edit ~/spark/conf/spark-env.sh > > modify the line > export SPARK_WORKER_INSTANCES=1 > to the multiple you want to set (e.g 2) > > I also added > export SPARK_WORKER_MEMORY=some reasonable value so that the total number of > workers on a node is within the available memory available on the node (e.g. > 2g) > > ~/spark-ec2/copy-dir /root/spark/conf > > ~/spark/sbin/start-all.sh > > > > From: Sean Owen [so...@cloudera.com] > Sent: Wednesday, November 26, 2014 12:14 AM > To: Yotto Koga > Cc: user@spark.apache.org > Subject: Re: configure to run multiple tasks on a core > > What about running, say, 2 executors per machine, each of which thinks > it should use all cores? > > You can also multi-thread your map function manually, directly, within > your code, with careful use of a java.util.concurrent.Executor > > On Wed, Nov 26, 2014 at 6:57 AM, yotto wrote: >> I'm running a spark-ec2 cluster. >> >> I have a map task that calls a specialized C++ external app. The app doesn't >> fully utilize the core as it needs to download/upload data as part of the >> task. Looking at the worker nodes, it appears that there is one task with my >> app running per core. >> >> I'd like to better utilize the cpu resources with the hope of increasing >> throughput by running multiple tasks (with my app) per core in parallel. >> >> I see there is a spark.task.cpus config setting with a default value of 1. >> It appears though that this is used to go the other way than what I am >> looking for. >> >> Is there a way where I can specify multiple tasks per core rather than >> multiple cores per task? >> >> thanks for any help. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: configure to run multiple tasks on a core
Thanks Sean. That worked out well. For anyone who happens onto this post and wants to do the same, these are the steps I took to do as Sean suggested... (Note this is for a stand alone cluster) login to the master ~/spark/sbin/stop-all.sh edit ~/spark/conf/spark-env.sh modify the line export SPARK_WORKER_INSTANCES=1 to the multiple you want to set (e.g 2) I also added export SPARK_WORKER_MEMORY=some reasonable value so that the total number of workers on a node is within the available memory available on the node (e.g. 2g) ~/spark-ec2/copy-dir /root/spark/conf ~/spark/sbin/start-all.sh From: Sean Owen [so...@cloudera.com] Sent: Wednesday, November 26, 2014 12:14 AM To: Yotto Koga Cc: user@spark.apache.org Subject: Re: configure to run multiple tasks on a core What about running, say, 2 executors per machine, each of which thinks it should use all cores? You can also multi-thread your map function manually, directly, within your code, with careful use of a java.util.concurrent.Executor On Wed, Nov 26, 2014 at 6:57 AM, yotto wrote: > I'm running a spark-ec2 cluster. > > I have a map task that calls a specialized C++ external app. The app doesn't > fully utilize the core as it needs to download/upload data as part of the > task. Looking at the worker nodes, it appears that there is one task with my > app running per core. > > I'd like to better utilize the cpu resources with the hope of increasing > throughput by running multiple tasks (with my app) per core in parallel. > > I see there is a spark.task.cpus config setting with a default value of 1. > It appears though that this is used to go the other way than what I am > looking for. > > Is there a way where I can specify multiple tasks per core rather than > multiple cores per task? > > thanks for any help. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: configure to run multiple tasks on a core
What about running, say, 2 executors per machine, each of which thinks it should use all cores? You can also multi-thread your map function manually, directly, within your code, with careful use of a java.util.concurrent.Executor On Wed, Nov 26, 2014 at 6:57 AM, yotto wrote: > I'm running a spark-ec2 cluster. > > I have a map task that calls a specialized C++ external app. The app doesn't > fully utilize the core as it needs to download/upload data as part of the > task. Looking at the worker nodes, it appears that there is one task with my > app running per core. > > I'd like to better utilize the cpu resources with the hope of increasing > throughput by running multiple tasks (with my app) per core in parallel. > > I see there is a spark.task.cpus config setting with a default value of 1. > It appears though that this is used to go the other way than what I am > looking for. > > Is there a way where I can specify multiple tasks per core rather than > multiple cores per task? > > thanks for any help. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
configure to run multiple tasks on a core
I'm running a spark-ec2 cluster. I have a map task that calls a specialized C++ external app. The app doesn't fully utilize the core as it needs to download/upload data as part of the task. Looking at the worker nodes, it appears that there is one task with my app running per core. I'd like to better utilize the cpu resources with the hope of increasing throughput by running multiple tasks (with my app) per core in parallel. I see there is a spark.task.cpus config setting with a default value of 1. It appears though that this is used to go the other way than what I am looking for. Is there a way where I can specify multiple tasks per core rather than multiple cores per task? thanks for any help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org