RE: Why Spark is much faster than Hadoop MapReduce even on disk
One reason Spark on disk is faster than MapReduce is Spark’s advanced Directed Acyclic Graph (DAG) engine. MapReduce will require a complex job to be split into multiple Map-Reduce jobs, with disk I/O at the end of each job and beginning of a new job. With Spark, you may be able to express the same job with fewer number of stages, invoking fewer disk I/O. Disk I/O is an expensive operation, so fewer disk I/O operation translates to better performance. Mohammed From: Ilya Ganelin [mailto:ilgan...@gmail.com] Sent: Monday, April 27, 2015 7:55 PM To: bit1...@163.com; user Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk I believe the typical answer is that Spark is actually a bit slower. On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.commailto:bit1...@163.com bit1...@163.commailto:bit1...@163.com wrote: Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks!
Re: Why Spark is much faster than Hadoop MapReduce even on disk
our experience is that unless you can benefit from spark features such as co-partitioning that allow for more efficient execution that spark is slightly slower for disk to disk. On Apr 27, 2015 10:34 PM, bit1...@163.com bit1...@163.com wrote: Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks! --
Why Spark is much faster than Hadoop MapReduce even on disk
Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks!
Re: Why Spark is much faster than Hadoop MapReduce even on disk
I believe the typical answer is that Spark is actually a bit slower. On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.com bit1...@163.com wrote: Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks! --
Re: Re: Why Spark is much faster than Hadoop MapReduce even on disk
Is it? I learned somewhere else that spark's speed is 5~10 times faster than Hadoop MapReduce. bit1...@163.com From: Ilya Ganelin Date: 2015-04-28 10:55 To: bit1...@163.com; user Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk I believe the typical answer is that Spark is actually a bit slower. On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.com bit1...@163.com wrote: Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks!
Re: Why Spark is much faster than Hadoop MapReduce even on disk
http://www.datascienceassn.org/content/making-sense-making-sense-performance-data-analytics-frameworks From: bit1...@163.com bit1...@163.com To: user user@spark.apache.org Sent: Monday, April 27, 2015 8:33 PM Subject: Why Spark is much faster than Hadoop MapReduce even on disk #yiv1713360705 body {line-height:1.5;}#yiv1713360705 body {font-size:10.5pt;color:rgb(0, 0, 0);line-height:1.5;}Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks!