Hi, Alluxio allows for data sharing between applications through a File System API (Native Java Alluxio client, Hadoop FileSystem, or POSIX through fuse). If your MPI applications can use any of these interfaces, you should be able to use Alluxio for data sharing out of the box.
In terms of duplicating in-memory data, you should only need one copy in Alluxio if you are able to stream your dataset. As for the performance of using Alluxio to back your data compared to using Spark's native in-memory representation, here is a blog <http://www.alluxio.com/2016/08/effective-spark-rdds-with-alluxio/> which details the pros and cons of the two approaches. At a high level, Alluxio performs better with larger datasets or if you plan to use your dataset in more than one Spark job. Hope this helps, Calvin