RE: Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-28 Thread Mohammed Guller
One reason Spark on disk is faster than MapReduce is Spark’s advanced Directed 
Acyclic Graph (DAG) engine. MapReduce will require a complex job to be split 
into multiple Map-Reduce jobs, with disk I/O at the end of each job and 
beginning of a new job. With Spark, you may be able to express the same job 
with fewer number of stages, invoking fewer disk I/O. Disk I/O is an expensive 
operation, so fewer disk I/O operation translates to better performance.

Mohammed

From: Ilya Ganelin [mailto:ilgan...@gmail.com]
Sent: Monday, April 27, 2015 7:55 PM
To: bit1...@163.com; user
Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk

I believe the typical answer is that Spark is actually a bit slower.
On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.commailto:bit1...@163.com 
bit1...@163.commailto:bit1...@163.com wrote:
Hi,

I am frequently asked why spark is also much faster than Hadoop MapReduce on 
disk (without the use of memory cache). I have no convencing answer for this 
question, could you guys elaborate on this? Thanks!





Re: Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-28 Thread Koert Kuipers
our experience is that unless you can benefit from spark features such as
co-partitioning that allow for more efficient execution that spark is
slightly slower for disk to disk.
On Apr 27, 2015 10:34 PM, bit1...@163.com bit1...@163.com wrote:

 Hi,

 I am frequently asked why spark is also much faster than Hadoop MapReduce
 on disk (without the use of memory cache). I have no convencing answer for
 this question, could you guys elaborate on this? Thanks!

 --




Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-27 Thread bit1...@163.com
Hi,

I am frequently asked why spark is also much faster than Hadoop MapReduce on 
disk (without the use of memory cache). I have no convencing answer for this 
question, could you guys elaborate on this? Thanks!






Re: Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-27 Thread Ilya Ganelin
I believe the typical answer is that Spark is actually a bit slower.
On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.com bit1...@163.com wrote:

 Hi,

 I am frequently asked why spark is also much faster than Hadoop MapReduce
 on disk (without the use of memory cache). I have no convencing answer for
 this question, could you guys elaborate on this? Thanks!

 --




Re: Re: Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-27 Thread bit1...@163.com
Is it? I learned somewhere else that spark's speed is 5~10 times faster than 
Hadoop MapReduce.



bit1...@163.com
 
From: Ilya Ganelin
Date: 2015-04-28 10:55
To: bit1...@163.com; user
Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk
I believe the typical answer is that Spark is actually a bit slower. 
On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.com bit1...@163.com wrote:
Hi,

I am frequently asked why spark is also much faster than Hadoop MapReduce on 
disk (without the use of memory cache). I have no convencing answer for this 
question, could you guys elaborate on this? Thanks!






Re: Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-27 Thread Michael Malak
http://www.datascienceassn.org/content/making-sense-making-sense-performance-data-analytics-frameworks
  
  From: bit1...@163.com bit1...@163.com
 To: user user@spark.apache.org 
 Sent: Monday, April 27, 2015 8:33 PM
 Subject: Why Spark is much faster than Hadoop MapReduce even on disk
   
#yiv1713360705 body {line-height:1.5;}#yiv1713360705 body 
{font-size:10.5pt;color:rgb(0, 0, 0);line-height:1.5;}Hi,
I am frequently asked why spark is also much faster than Hadoop MapReduce on 
disk (without the use of memory cache). I have no convencing answer for this 
question, could you guys elaborate on this? Thanks!