Re: Limiting Pyspark.daemons
Thanks. I'll try that. Hopefully that should work. On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin <math...@closetwork.org> wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. > > On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav <ashraag...@gmail.com> > wrote: > >> I am thinking of any possibilities as to why this could be happening. If >> the cores are multi-threaded, should that affect the daemons? Your spark >> was built from source code or downloaded as a binary, though that should >> not technically change anything? >> >> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <math...@closetwork.org> >> wrote: >> >>> 1.6.1. >>> >>> I have no idea. SPARK_WORKER_CORES should do the same. >>> >>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <ashraag...@gmail.com> >>> wrote: >>> >>>> Which version of Spark are you using? 1.6.1? >>>> >>>> Any ideas as to why it is not working in ours? >>>> >>>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org >>>> > wrote: >>>> >>>>> 16. >>>>> >>>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I tried what you suggested and started the slave using the following >>>>>> command: >>>>>> >>>>>> start-slave.sh --cores 1 >>>>>> >>>>>> But it still seems to start as many pyspark daemons as the number of >>>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >>>>>> file by giving SPARK_WORKER_CORES=1 also didn't help. >>>>>> >>>>>> When you said it helped you and limited it to 2 processes in your >>>>>> cluster, how many cores did each machine have? >>>>>> >>>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin < >>>>>> math...@closetwork.org> wrote: >>>>>> >>>>>>> It depends on what you want to do: >>>>>>> >>>>>>> If, on any given server, you don't want Spark to use more than one >>>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh >>>>>>> --cores=1 >>>>>>> >>>>>>> If you have a bunch of servers dedicated to Spark, but you don't >>>>>>> want a driver to use more than one core per server, then: >>>>>>> spark.executor.cores=1 >>>>>>> tells it not to use more than 1 core per server. However, it seems it >>>>>>> will >>>>>>> start as many pyspark as there are cores, but maybe not use them. >>>>>>> >>>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Mathieu, >>>>>>>> >>>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how >>>>>>>> can I specify "--cores=1" from the application? >>>>>>>> >>>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < >>>>>>>> math...@closetwork.org> wrote: >>>>>>>> >>>>>>>>> When running the executor, put --cores=1. We use this and I only >>>>>>>>> see 2 pyspark process, one seem to be the parent of the other and is >>>>>>>>> idle. >>>>>>>>> >>>>>>>>> In your case, are all pyspark process working? >>>>>>>>> >>>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >>>>>>>>>> application >>>>>>>>>> is run, the load on the workers seems to go more than what was >>>>>>>>>> given. When I >>>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons >>>>>>&g
Re: Limiting Pyspark.daemons
I am thinking of any possibilities as to why this could be happening. If the cores are multi-threaded, should that affect the daemons? Your spark was built from source code or downloaded as a binary, though that should not technically change anything? On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <math...@closetwork.org> wrote: > 1.6.1. > > I have no idea. SPARK_WORKER_CORES should do the same. > > On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <ashraag...@gmail.com> > wrote: > >> Which version of Spark are you using? 1.6.1? >> >> Any ideas as to why it is not working in ours? >> >> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org> >> wrote: >> >>> 16. >>> >>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I tried what you suggested and started the slave using the following >>>> command: >>>> >>>> start-slave.sh --cores 1 >>>> >>>> But it still seems to start as many pyspark daemons as the number of >>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >>>> file by giving SPARK_WORKER_CORES=1 also didn't help. >>>> >>>> When you said it helped you and limited it to 2 processes in your >>>> cluster, how many cores did each machine have? >>>> >>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <math...@closetwork.org >>>> > wrote: >>>> >>>>> It depends on what you want to do: >>>>> >>>>> If, on any given server, you don't want Spark to use more than one >>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh >>>>> --cores=1 >>>>> >>>>> If you have a bunch of servers dedicated to Spark, but you don't want >>>>> a driver to use more than one core per server, then: >>>>> spark.executor.cores=1 >>>>> tells it not to use more than 1 core per server. However, it seems it will >>>>> start as many pyspark as there are cores, but maybe not use them. >>>>> >>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Mathieu, >>>>>> >>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how >>>>>> can I specify "--cores=1" from the application? >>>>>> >>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin < >>>>>> math...@closetwork.org> wrote: >>>>>> >>>>>>> When running the executor, put --cores=1. We use this and I only see >>>>>>> 2 pyspark process, one seem to be the parent of the other and is idle. >>>>>>> >>>>>>> In your case, are all pyspark process working? >>>>>>> >>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >>>>>>>> application >>>>>>>> is run, the load on the workers seems to go more than what was >>>>>>>> given. When I >>>>>>>> ran top, I noticed that there were too many Pyspark.daemons >>>>>>>> processes >>>>>>>> running. There was another mail thread regarding the same: >>>>>>>> >>>>>>>> >>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >>>>>>>> >>>>>>>> I followed what was mentioned there, i.e. reduced the number of >>>>>>>> executor >>>>>>>> cores and number of executors in one node to 1. But the number of >>>>>>>> pyspark.daemons process is still not coming down. It looks like >>>>>>>> initially >>>>>>>> there is one Pyspark.daemons process and this in turn spawns as many >>>>>>>> pyspark.daemons processes as the number of cores in the machine. >>>>>>>> >>>>>>>> Any help is appreciated :) >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Ashwin Raaghav. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> >>>>>>>> - >>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>>>> >>>>>>>> -- >>>>>>> Mathieu Longtin >>>>>>> 1-514-803-8977 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> >>>>>> Ashwin Raaghav >>>>>> >>>>> -- >>>>> Mathieu Longtin >>>>> 1-514-803-8977 >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Ashwin Raaghav >>>> >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
Which version of Spark are you using? 1.6.1? Any ideas as to why it is not working in ours? On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org> wrote: > 16. > > On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com> > wrote: > >> Hi, >> >> I tried what you suggested and started the slave using the following >> command: >> >> start-slave.sh --cores 1 >> >> But it still seems to start as many pyspark daemons as the number of >> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh >> file by giving SPARK_WORKER_CORES=1 also didn't help. >> >> When you said it helped you and limited it to 2 processes in your >> cluster, how many cores did each machine have? >> >> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <math...@closetwork.org> >> wrote: >> >>> It depends on what you want to do: >>> >>> If, on any given server, you don't want Spark to use more than one core, >>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 >>> >>> If you have a bunch of servers dedicated to Spark, but you don't want a >>> driver to use more than one core per server, then: spark.executor.cores=1 >>> tells it not to use more than 1 core per server. However, it seems it will >>> start as many pyspark as there are cores, but maybe not use them. >>> >>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com> >>> wrote: >>> >>>> Hi Mathieu, >>>> >>>> Isn't that the same as setting "spark.executor.cores" to 1? And how can >>>> I specify "--cores=1" from the application? >>>> >>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org >>>> > wrote: >>>> >>>>> When running the executor, put --cores=1. We use this and I only see 2 >>>>> pyspark process, one seem to be the parent of the other and is idle. >>>>> >>>>> In your case, are all pyspark process working? >>>>> >>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >>>>>> application >>>>>> is run, the load on the workers seems to go more than what was given. >>>>>> When I >>>>>> ran top, I noticed that there were too many Pyspark.daemons processes >>>>>> running. There was another mail thread regarding the same: >>>>>> >>>>>> >>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >>>>>> >>>>>> I followed what was mentioned there, i.e. reduced the number of >>>>>> executor >>>>>> cores and number of executors in one node to 1. But the number of >>>>>> pyspark.daemons process is still not coming down. It looks like >>>>>> initially >>>>>> there is one Pyspark.daemons process and this in turn spawns as many >>>>>> pyspark.daemons processes as the number of cores in the machine. >>>>>> >>>>>> Any help is appreciated :) >>>>>> >>>>>> Thanks, >>>>>> Ashwin Raaghav. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> - >>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>> >>>>>> -- >>>>> Mathieu Longtin >>>>> 1-514-803-8977 >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Ashwin Raaghav >>>> >>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
Hi, I tried what you suggested and started the slave using the following command: start-slave.sh --cores 1 But it still seems to start as many pyspark daemons as the number of cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by giving SPARK_WORKER_CORES=1 also didn't help. When you said it helped you and limited it to 2 processes in your cluster, how many cores did each machine have? On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <math...@closetwork.org> wrote: > It depends on what you want to do: > > If, on any given server, you don't want Spark to use more than one core, > use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 > > If you have a bunch of servers dedicated to Spark, but you don't want a > driver to use more than one core per server, then: spark.executor.cores=1 > tells it not to use more than 1 core per server. However, it seems it will > start as many pyspark as there are cores, but maybe not use them. > > On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com> > wrote: > >> Hi Mathieu, >> >> Isn't that the same as setting "spark.executor.cores" to 1? And how can I >> specify "--cores=1" from the application? >> >> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org> >> wrote: >> >>> When running the executor, put --cores=1. We use this and I only see 2 >>> pyspark process, one seem to be the parent of the other and is idle. >>> >>> In your case, are all pyspark process working? >>> >>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >>>> application >>>> is run, the load on the workers seems to go more than what was given. >>>> When I >>>> ran top, I noticed that there were too many Pyspark.daemons processes >>>> running. There was another mail thread regarding the same: >>>> >>>> >>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >>>> >>>> I followed what was mentioned there, i.e. reduced the number of executor >>>> cores and number of executors in one node to 1. But the number of >>>> pyspark.daemons process is still not coming down. It looks like >>>> initially >>>> there is one Pyspark.daemons process and this in turn spawns as many >>>> pyspark.daemons processes as the number of cores in the machine. >>>> >>>> Any help is appreciated :) >>>> >>>> Thanks, >>>> Ashwin Raaghav. >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> -- >>> Mathieu Longtin >>> 1-514-803-8977 >>> >> >> >> >> -- >> Regards, >> >> Ashwin Raaghav >> > -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Limiting Pyspark.daemons
Hi Mathieu, Isn't that the same as setting "spark.executor.cores" to 1? And how can I specify "--cores=1" from the application? On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org> wrote: > When running the executor, put --cores=1. We use this and I only see 2 > pyspark process, one seem to be the parent of the other and is idle. > > In your case, are all pyspark process working? > > On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote: > >> Hi, >> >> I am currently using PySpark 1.6.1 in my cluster. When a pyspark >> application >> is run, the load on the workers seems to go more than what was given. >> When I >> ran top, I noticed that there were too many Pyspark.daemons processes >> running. There was another mail thread regarding the same: >> >> >> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E >> >> I followed what was mentioned there, i.e. reduced the number of executor >> cores and number of executors in one node to 1. But the number of >> pyspark.daemons process is still not coming down. It looks like initially >> there is one Pyspark.daemons process and this in turn spawns as many >> pyspark.daemons processes as the number of cores in the machine. >> >> Any help is appreciated :) >> >> Thanks, >> Ashwin Raaghav. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- > Mathieu Longtin > 1-514-803-8977 > -- Regards, Ashwin Raaghav
Re: Adding h5 files in a zip to use with PySpark
Thanks! That worked! :) And to read the files, I used pyspark.SparkFiles module. On Thu, Jun 16, 2016 at 7:12 AM, Sun Rui <sunrise_...@163.com> wrote: > have you tried > --files ? > > On Jun 15, 2016, at 18:50, ar7 <ashraag...@gmail.com> wrote: > > > > I am using PySpark 1.6.1 for my spark application. I have additional > modules > > which I am loading using the argument --py-files. I also have a h5 file > > which I need to access from one of the modules for initializing the > > ApolloNet. > > > > Is there any way I could access those files from the modules if I put > them > > in the same archive? I tried this approach but it was throwing an error > > because the files are not there in every worker. I can think of one > solution > > which is copying the file to each of the workers but I want to know if > there > > are better ways to do it? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Adding-h5-files-in-a-zip-to-use-with-PySpark-tp27173.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > -- Regards, Ashwin Raaghav