Hi Shangyu, The daemon.py python process is the actual PySpark worker process, and is launched by the Spark worker when running Python jobs. So, when using PySpark, the "real computation" is handled by a python process (via daemon.py), not a java process.
Hope that helps, -Jey On Mon, Oct 7, 2013 at 9:50 PM, Shangyu Luo <[email protected]> wrote: > Hello! > I am using Spark 0.7.3 with python version. Recently when I run some spark > program on a cluster, I found that some processes invoked by > spark-0.7.3/python/pyspark/daemon.py would capturing CPU for a long time and > consume much memory (e.g., 5g for each process). It seemed that the java > process, which was invoked by > java -cp > :/usr/lib/spark-0.7.3/conf:/usr/lib/spark-0.7.3/core/target/scala-2.9.3/classes > ... , was 'competing' with the daemon.py for CPU resources. From my > understanding, the java process should be responsible for the 'real' > computation in spark. > So I am wondering what job the daemon.py will work on? Is it normal for it > to consume a lot of CPU and memory? > Thanks! > > > Best, > Shangyu Luo > -- > -- > > Shangyu, Luo > Department of Computer Science > Rice University >
