Spark in a heterogeneous computing environment

2013-10-08 Thread Markus Losoi
Hi Is it currently possible to define in Spark that some worker node should be preferred to the other worker nodes? That is, in a heterogeneous computing environment some computing units can be more powerful than the others and assigning computing jobs to them should be prioritized. Best

Re: Spark dependency library causing problems with conflicting versions at import

2013-10-08 Thread Mingyu Kim
Thanks for the response! I'll try out the 2.10 branch. That seems to be the best bet for now. Btw, how does updating maven file do the private namespacing? We've been trying out jarjar (https://code.google.com/p/jarjar/), but as you mentioned, reflection has been biting us painfully so far. I'm

Re: The functionality of daemon.py?

2013-10-08 Thread Jey Kottalam
Hi Shangyu, The daemon.py python process is the actual PySpark worker process, and is launched by the Spark worker when running Python jobs. So, when using PySpark, the real computation is handled by a python process (via daemon.py), not a java process. Hope that helps, -Jey On Mon, Oct 7, 2013

Re: The functionality of daemon.py?

2013-10-08 Thread Shangyu Luo
Hello Jey, Thank you for answering. I have found that there are about 6 or 7 'daemon.py' processes in one worker node. Will each core have a 'daemon.py' process? How to decide how many 'daemon.py' processes in one worker node? I have also found that there are many spark related java process in a

Re: The functionality of daemon.py?

2013-10-08 Thread Shangyu Luo
Also, I found that the 'daemon.py' will continue running on one worker node even after I terminated the spark job at master node. A little strange for me. 2013/10/8 Shangyu Luo lsy...@gmail.com Hello Jey, Thank you for answering. I have found that there are about 6 or 7 'daemon.py' processes

Re: spark_ec2 script in 0.8.0 and mesos

2013-10-08 Thread Aaron Davidson
Also, please post feature requests here: http://spark-project.atlassian.net Make sure to search prior to posting to avoid duplicates. On Tue, Oct 8, 2013 at 11:50 AM, Matei Zaharia matei.zaha...@gmail.comwrote: Hi Shay, We actually don't support Mesos in the EC2 scripts anymore -- sorry

Re: spark through vpn, SPARK_LOCAL_IP

2013-10-08 Thread Aaron Babcock
Replying to document my fix: I was able to trick spark into working by setting my hostname to my preferred ip address. ie $ sudo hostname 192.168.250.47 Not sure if this is a good idea in general, but it worked well enough for me to develop with my macbook driving the cluster through the vpn.

How would I start writing a RDD[ProtoBuf] and/or sc.newAPIHadoopFile??

2013-10-08 Thread Shay Seng
Hi, I would like to store some data as a seq of protobuf objects. I would of course need to beable to read that into an RDD and write the RDD back out in some binary format. First of all, is this supported natively (or through some download)? If not, are there examples on how I might write my