Do Ignite and Alluxio offer reasonable means of transferring data, in memory,
from Spark to MPI? A straightforward way to transfer data is use piping, but
unless you have MPI processes running in a one-to-one mapping to the Spark
partitions, this will require some complicated logic to get working (you'll
have to handle multiple tasks sending their data to one process). 

It seems like potentially Ignite and Alluxio might allow you to pull the
data you want into each of your MPI processes without worrying about such a
requirement, but it's not clear to me from the high-level descriptions of
the systems whether this is something that can be readily realized. Is this
the case?

Another issue is that with the piping solution, you only need to store two
copies of the data: one each on the Spark and MPI sides. With Ignite and
Alluxio, would you need three? It seems that they let you replace the
standard RDDs with RDDs backed with their memory stores, but do those
perform as efficiently as the standard Spark RDDs that are persisted in

More generally, I'd be interested to know if there are existing solutions to
this problem of transferring data between MPI and Spark. Thanks for any
insight you can offer!

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

Reply via email to