Hi Mutahir, I will try to answer some of your questions.
Q1) Can we use Mapreduce and apache spark in the same cluster Yes. I run a cluster with both MapReduce2 and Spark and I use Yarn as the resource manager. Q2) is it mandatory to use GPUs for apache spark? No. My cluster has Spark and does not have any GPUs. Q3) I read that apache spark is in-memory, will it benefit from SSD / Flash for caching or persistent storage? As you noted, Spark is in-memory but there may be a few places that faster storage may benefit including: - Storage of the data file data read into Spark from HDFS DataNodes - RDD persistence <https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence> when caching includes one of the disk options - Spark shuffle service - Between Spark stages which process the data in-memory, intermediate results from Spark executors are written to storage and served to the next stage by the shuffle service. I don't have any benchmark results for these, but it might be something you want to look into. Thanks, Jamison -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org