Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Thanks. I'll try that. Hopefully that should work.

On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin <math...@closetwork.org>
wrote:

> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
>> I am thinking of any possibilities as to why this could be happening. If
>> the cores are multi-threaded, should that affect the daemons? Your spark
>> was built from source code or downloaded as a binary, though that should
>> not technically change anything?
>>
>> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <math...@closetwork.org>
>> wrote:
>>
>>> 1.6.1.
>>>
>>> I have no idea. SPARK_WORKER_CORES should do the same.
>>>
>>> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <ashraag...@gmail.com>
>>> wrote:
>>>
>>>> Which version of Spark are you using? 1.6.1?
>>>>
>>>> Any ideas as to why it is not working in ours?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org
>>>> > wrote:
>>>>
>>>>> 16.
>>>>>
>>>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried what you suggested and started the slave using the following
>>>>>> command:
>>>>>>
>>>>>> start-slave.sh --cores 1 
>>>>>>
>>>>>> But it still seems to start as many pyspark daemons as the number of
>>>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>>>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>>>>
>>>>>> When you said it helped you and limited it to 2 processes in your
>>>>>> cluster, how many cores did each machine have?
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <
>>>>>> math...@closetwork.org> wrote:
>>>>>>
>>>>>>> It depends on what you want to do:
>>>>>>>
>>>>>>> If, on any given server, you don't want Spark to use more than one
>>>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>>>>> --cores=1
>>>>>>>
>>>>>>> If you have a bunch of servers dedicated to Spark, but you don't
>>>>>>> want a driver to use more than one core per server, then: 
>>>>>>> spark.executor.cores=1
>>>>>>> tells it not to use more than 1 core per server. However, it seems it 
>>>>>>> will
>>>>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>>>>
>>>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mathieu,
>>>>>>>>
>>>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>>>>> can I specify "--cores=1" from the application?
>>>>>>>>
>>>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>>>>> math...@closetwork.org> wrote:
>>>>>>>>
>>>>>>>>> When running the executor, put --cores=1. We use this and I only
>>>>>>>>> see 2 pyspark process, one seem to be the parent of the other and is 
>>>>>>>>> idle.
>>>>>>>>>
>>>>>>>>> In your case, are all pyspark process working?
>>>>>>>>>
>>>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>>>>> application
>>>>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>>>>> given. When I
>>>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons
>>>>>>&g

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
I am thinking of any possibilities as to why this could be happening. If
the cores are multi-threaded, should that affect the daemons? Your spark
was built from source code or downloaded as a binary, though that should
not technically change anything?

On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin <math...@closetwork.org>
wrote:

> 1.6.1.
>
> I have no idea. SPARK_WORKER_CORES should do the same.
>
> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
>> Which version of Spark are you using? 1.6.1?
>>
>> Any ideas as to why it is not working in ours?
>>
>> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org>
>> wrote:
>>
>>> 16.
>>>
>>> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I tried what you suggested and started the slave using the following
>>>> command:
>>>>
>>>> start-slave.sh --cores 1 
>>>>
>>>> But it still seems to start as many pyspark daemons as the number of
>>>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>>>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>>>
>>>> When you said it helped you and limited it to 2 processes in your
>>>> cluster, how many cores did each machine have?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <math...@closetwork.org
>>>> > wrote:
>>>>
>>>>> It depends on what you want to do:
>>>>>
>>>>> If, on any given server, you don't want Spark to use more than one
>>>>> core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh
>>>>> --cores=1
>>>>>
>>>>> If you have a bunch of servers dedicated to Spark, but you don't want
>>>>> a driver to use more than one core per server, then: 
>>>>> spark.executor.cores=1
>>>>> tells it not to use more than 1 core per server. However, it seems it will
>>>>> start as many pyspark as there are cores, but maybe not use them.
>>>>>
>>>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Mathieu,
>>>>>>
>>>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how
>>>>>> can I specify "--cores=1" from the application?
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <
>>>>>> math...@closetwork.org> wrote:
>>>>>>
>>>>>>> When running the executor, put --cores=1. We use this and I only see
>>>>>>> 2 pyspark process, one seem to be the parent of the other and is idle.
>>>>>>>
>>>>>>> In your case, are all pyspark process working?
>>>>>>>
>>>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>>>> application
>>>>>>>> is run, the load on the workers seems to go more than what was
>>>>>>>> given. When I
>>>>>>>> ran top, I noticed that there were too many Pyspark.daemons
>>>>>>>> processes
>>>>>>>> running. There was another mail thread regarding the same:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>>>>>>>
>>>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>>>> executor
>>>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>>>> initially
>>>>>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>>>
>>>>>>>> Any help is appreciated :)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ashwin Raaghav.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>> -
>>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Ashwin Raaghav
>>>>>>
>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Ashwin Raaghav
>>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Which version of Spark are you using? 1.6.1?

Any ideas as to why it is not working in ours?

On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin <math...@closetwork.org>
wrote:

> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried what you suggested and started the slave using the following
>> command:
>>
>> start-slave.sh --cores 1 
>>
>> But it still seems to start as many pyspark daemons as the number of
>> cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh
>> file by giving SPARK_WORKER_CORES=1 also didn't help.
>>
>> When you said it helped you and limited it to 2 processes in your
>> cluster, how many cores did each machine have?
>>
>> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <math...@closetwork.org>
>> wrote:
>>
>>> It depends on what you want to do:
>>>
>>> If, on any given server, you don't want Spark to use more than one core,
>>> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>>>
>>> If you have a bunch of servers dedicated to Spark, but you don't want a
>>> driver to use more than one core per server, then: spark.executor.cores=1
>>> tells it not to use more than 1 core per server. However, it seems it will
>>> start as many pyspark as there are cores, but maybe not use them.
>>>
>>> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com>
>>> wrote:
>>>
>>>> Hi Mathieu,
>>>>
>>>> Isn't that the same as setting "spark.executor.cores" to 1? And how can
>>>> I specify "--cores=1" from the application?
>>>>
>>>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org
>>>> > wrote:
>>>>
>>>>> When running the executor, put --cores=1. We use this and I only see 2
>>>>> pyspark process, one seem to be the parent of the other and is idle.
>>>>>
>>>>> In your case, are all pyspark process working?
>>>>>
>>>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>>>> application
>>>>>> is run, the load on the workers seems to go more than what was given.
>>>>>> When I
>>>>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>>>>> running. There was another mail thread regarding the same:
>>>>>>
>>>>>>
>>>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>>>>>
>>>>>> I followed what was mentioned there, i.e. reduced the number of
>>>>>> executor
>>>>>> cores and number of executors in one node to 1. But the number of
>>>>>> pyspark.daemons process is still not coming down. It looks like
>>>>>> initially
>>>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>>>
>>>>>> Any help is appreciated :)
>>>>>>
>>>>>> Thanks,
>>>>>> Ashwin Raaghav.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> -
>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Ashwin Raaghav
>>>>
>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Hi,

I tried what you suggested and started the slave using the following
command:

start-slave.sh --cores 1 

But it still seems to start as many pyspark daemons as the number of cores
in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by
giving SPARK_WORKER_CORES=1 also didn't help.

When you said it helped you and limited it to 2 processes in your cluster,
how many cores did each machine have?

On Mon, Jul 4, 2016 at 8:22 PM, Mathieu Longtin <math...@closetwork.org>
wrote:

> It depends on what you want to do:
>
> If, on any given server, you don't want Spark to use more than one core,
> use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1
>
> If you have a bunch of servers dedicated to Spark, but you don't want a
> driver to use more than one core per server, then: spark.executor.cores=1
> tells it not to use more than 1 core per server. However, it seems it will
> start as many pyspark as there are cores, but maybe not use them.
>
> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav <ashraag...@gmail.com>
> wrote:
>
>> Hi Mathieu,
>>
>> Isn't that the same as setting "spark.executor.cores" to 1? And how can I
>> specify "--cores=1" from the application?
>>
>> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org>
>> wrote:
>>
>>> When running the executor, put --cores=1. We use this and I only see 2
>>> pyspark process, one seem to be the parent of the other and is idle.
>>>
>>> In your case, are all pyspark process working?
>>>
>>> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>>>> application
>>>> is run, the load on the workers seems to go more than what was given.
>>>> When I
>>>> ran top, I noticed that there were too many Pyspark.daemons processes
>>>> running. There was another mail thread regarding the same:
>>>>
>>>>
>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>>>
>>>> I followed what was mentioned there, i.e. reduced the number of executor
>>>> cores and number of executors in one node to 1. But the number of
>>>> pyspark.daemons process is still not coming down. It looks like
>>>> initially
>>>> there is one Pyspark.daemons process and this in turn spawns as many
>>>> pyspark.daemons processes as the number of cores in the machine.
>>>>
>>>> Any help is appreciated :)
>>>>
>>>> Thanks,
>>>> Ashwin Raaghav.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Ashwin Raaghav
>>
> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Hi Mathieu,

Isn't that the same as setting "spark.executor.cores" to 1? And how can I
specify "--cores=1" from the application?

On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin <math...@closetwork.org>
wrote:

> When running the executor, put --cores=1. We use this and I only see 2
> pyspark process, one seem to be the parent of the other and is idle.
>
> In your case, are all pyspark process working?
>
> On Mon, Jul 4, 2016 at 3:15 AM ar7 <ashraag...@gmail.com> wrote:
>
>> Hi,
>>
>> I am currently using PySpark 1.6.1 in my cluster. When a pyspark
>> application
>> is run, the load on the workers seems to go more than what was given.
>> When I
>> ran top, I noticed that there were too many Pyspark.daemons processes
>> running. There was another mail thread regarding the same:
>>
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201606.mbox/%3ccao429hvi3drc-ojemue3x4q1vdzt61htbyeacagtre9yrhs...@mail.gmail.com%3E
>>
>> I followed what was mentioned there, i.e. reduced the number of executor
>> cores and number of executors in one node to 1. But the number of
>> pyspark.daemons process is still not coming down. It looks like initially
>> there is one Pyspark.daemons process and this in turn spawns as many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raaghav.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Regards,

Ashwin Raaghav


Re: Adding h5 files in a zip to use with PySpark

2016-06-15 Thread Ashwin Raaghav
Thanks! That worked! :)

And to read the files, I used pyspark.SparkFiles module.


On Thu, Jun 16, 2016 at 7:12 AM, Sun Rui <sunrise_...@163.com> wrote:

> have you tried
> --files ?
> > On Jun 15, 2016, at 18:50, ar7 <ashraag...@gmail.com> wrote:
> >
> > I am using PySpark 1.6.1 for my spark application. I have additional
> modules
> > which I am loading using the argument --py-files. I also have a h5 file
> > which I need to access from one of the modules for initializing the
> > ApolloNet.
> >
> > Is there any way I could access those files from the modules if I put
> them
> > in the same archive? I tried this approach but it was throwing an error
> > because the files are not there in every worker. I can think of one
> solution
> > which is copying the file to each of the workers but I want to know if
> there
> > are better ways to do it?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Adding-h5-files-in-a-zip-to-use-with-PySpark-tp27173.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
>


-- 
Regards,

Ashwin Raaghav