" I am working on a spark application that requires the ability to run a
function on each node in the cluster
"
--
Use Apache Ignite instead of Spark. Trust me it's awesome for this use case.
Regards,
Rabin Banerjee
On Jul 19, 2016 3:27 AM, "joshuata" wrote:
> I am working on a spark application
Technical limitations keep us from running another filesystem on the SSDs.
We are running on a very large HPC cluster without control over low-level
system components. We have tried setting up an ad-hoc HDFS cluster on the
nodes in our allocation, but we have had very little luck. It ends up being
Thank you for that advice. I have tried similar techniques, but not that
one.
On Mon, Jul 18, 2016 at 11:42 PM Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:
> Thanks for the explanation. Try creating a custom RDD whose getPartitions
> returns an array of custom partition objects of size n
The whole point of a well designed global filesystem is to not move the data
On Jul 19, 2016 10:07, "Koert Kuipers" wrote:
> If you run hdfs on those ssds (with low replication factor) wouldn't it
> also effectively write to local disk with low latency?
>
> On Jul 18, 2016 21:54, "Josh Asplund"
Thanks for the explanation. Try creating a custom RDD whose getPartitions
returns an array of custom partition objects of size n (= number of nodes).
In a custom partition object, you can have the file path and ip/hostname
where the partition needs to be computed. Then, have getPreferredLocations
r
The spark workers are running side-by-side with scientific simulation code.
The code writes output to local SSDs to keep latency low. Due to the volume
of data being moved (10's of terabytes +), it isn't really feasible to copy
the data to a global filesystem. Executing a function on each node woul
You can't assume that the number to nodes will be constant as some may
fail, hence you can't guarantee that a function will execute at most once
or atleast once on a node. Can you explain your use case in a bit more
detail?
On Mon, Jul 18, 2016, 10:57 PM joshuata wrote:
> I am working on a spark