Hi Mutahir,

I will try to answer some of your questions.

Q1) Can we use Mapreduce and apache spark in the same cluster
Yes. I run a cluster with both MapReduce2 and Spark and I use Yarn as the
resource manager.

Q2) is it mandatory to use GPUs for apache spark?
No. My cluster has Spark and does not have any GPUs.

Q3) I read that apache spark is in-memory, will it benefit from SSD / Flash
for caching or persistent storage?
As you noted, Spark is in-memory but there may be a few places that faster
storage may benefit including:
- Storage of the data file data read into Spark from HDFS DataNodes
-  RDD persistence
<https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence>
  
when caching includes one of the disk options
- Spark shuffle service - Between Spark stages which process the data
in-memory, intermediate results from Spark executors are written to storage
and served to the next stage by the shuffle service.
I don't have any benchmark results for these, but it might be something you
want to look into.

Thanks,
Jamison



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to