One reason Spark on disk is faster than MapReduce is Spark’s advanced Directed 
Acyclic Graph (DAG) engine. MapReduce will require a complex job to be split 
into multiple Map-Reduce jobs, with disk I/O at the end of each job and 
beginning of a new job. With Spark, you may be able to express the same job 
with fewer number of stages, invoking fewer disk I/O. Disk I/O is an expensive 
operation, so fewer disk I/O operation translates to better performance.

Mohammed

From: Ilya Ganelin [mailto:ilgan...@gmail.com]
Sent: Monday, April 27, 2015 7:55 PM
To: bit1...@163.com; user
Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk

I believe the typical answer is that Spark is actually a bit slower.
On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.com<mailto:bit1...@163.com> 
<bit1...@163.com<mailto:bit1...@163.com>> wrote:
Hi,

I am frequently asked why spark is also much faster than Hadoop MapReduce on 
disk (without the use of memory cache). I have no convencing answer for this 
question, could you guys elaborate on this? Thanks!

________________________________

Reply via email to