Re: The functionality of daemon.py?

Jey Kottalam Tue, 08 Oct 2013 08:00:57 -0700

Hi Shangyu,

The daemon.py python process is the actual PySpark worker process, and
is launched by the Spark worker when running Python jobs. So, when
using PySpark, the "real computation" is handled by a python process
(via daemon.py), not a java process.


Hope that helps,
-Jey

On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo <[email protected]> wrote:
> Hello!
> I am using Spark 0.7.3 with python version.  Recently when I run some spark
> program on a cluster, I found that some processes invoked by
> spark-0.7.3/python/pyspark/daemon.py would capturing CPU for a long time and
> consume much memory (e.g., 5g for each process). It seemed that the java
> process, which was invoked by
> java -cp
> :/usr/lib/spark-0.7.3/conf:/usr/lib/spark-0.7.3/core/target/scala-2.9.3/classes
> ...  , was 'competing' with the daemon.py for CPU resources. From my
> understanding, the java process should be responsible for the 'real'
> computation in spark.
> So I am wondering what job the daemon.py will work on? Is it normal for it
> to consume a lot of CPU and memory?
> Thanks!
>
>
> Best,
> Shangyu Luo
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>

Re: The functionality of daemon.py?

Reply via email to