Hi
Is it currently possible to define in Spark that some worker node should be
preferred to the other worker nodes? That is, in a heterogeneous computing
environment some computing units can be more powerful than the others and
assigning computing jobs to them should be prioritized.
Best
Thanks for the response! I'll try out the 2.10 branch. That seems to be the
best bet for now.
Btw, how does updating maven file do the private namespacing? We've been
trying out jarjar (https://code.google.com/p/jarjar/), but as you mentioned,
reflection has been biting us painfully so far. I'm
Hi Shangyu,
The daemon.py python process is the actual PySpark worker process, and
is launched by the Spark worker when running Python jobs. So, when
using PySpark, the real computation is handled by a python process
(via daemon.py), not a java process.
Hope that helps,
-Jey
On Mon, Oct 7, 2013
Hello Jey,
Thank you for answering. I have found that there are about 6 or 7
'daemon.py' processes in one worker node. Will each core have a 'daemon.py'
process? How to decide how many 'daemon.py' processes in one worker node? I
have also found that there are many spark related java process in a
Also, I found that the 'daemon.py' will continue running on one worker node
even after I terminated the spark job at master node. A little strange for
me.
2013/10/8 Shangyu Luo lsy...@gmail.com
Hello Jey,
Thank you for answering. I have found that there are about 6 or 7
'daemon.py' processes
Also, please post feature requests here: http://spark-project.atlassian.net
Make sure to search prior to posting to avoid duplicates.
On Tue, Oct 8, 2013 at 11:50 AM, Matei Zaharia matei.zaha...@gmail.comwrote:
Hi Shay,
We actually don't support Mesos in the EC2 scripts anymore -- sorry
Replying to document my fix:
I was able to trick spark into working by setting my hostname to my
preferred ip address.
ie
$ sudo hostname 192.168.250.47
Not sure if this is a good idea in general, but it worked well enough
for me to develop with my macbook driving the cluster through the vpn.
Hi,
I would like to store some data as a seq of protobuf objects. I would of
course need to beable to read that into an RDD and write the RDD back out
in some binary format.
First of all, is this supported natively (or through some download)?
If not, are there examples on how I might write my