All, Does anyone have any reference to a publication or other, informal sources (blogs, notes), showing performance of Spark on HDFS vs. other shared (Lustre, etc.) or other file system (NFS).
I need this for formal performance research. We are currently doing a research into this on a very specific, butique machine, and we are seeing some controversial results. For the purpose of literature survey and general comparison I would like to see the findings that others have had. I know that general wisdom states that Spark and HDFS should work the best because of the data locality awareness. Thank you, *Edmon Begoli, PhD* Chief Data Officer Joint Institute for Computational Sciences (JICS) ebeg...@tennessee.edu https://www.linkedin.com/in/ebegoli