Re: Performance of Spark/MapReduce

Sean Owen Mon, 06 Jun 2016 00:59:50 -0700

I don't think that quote is true in general. Given a map-only task, or a
map+shuffle+reduce, I'd expect MapReduce to be the same speed or a little
faster. It is the simpler, lower-level, narrower mechanism. It's all
limited by how fast you can read/write data and execute the user code.


There's a big difference if you're executing a many-stage pipeline where a
chain of M/R jobs would be writing back to disk each time, but a Spark job
could stay in memory. This is most of the source of that quote.

I think the argument for Spark is 95% that it's a higher-level API. Writing
M/R takes 10s of times more code. But, I think people were already using
things like Crunch on M/R before Spark anyway.

Spark still adds value with things like a DataFrame API; if you're doing
work that fits its constraints, then it can optimize more under the hood.
M/R is just a job scheduler for user code.

On Mon, Jun 6, 2016 at 4:12 AM, Deepak Goel <deic...@gmail.com> wrote:

>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
> Sorry about that (The question might still be general as I am new to
> Spark).
>
> My question is:
>
> Spark claims to be 10x times faster on disk and 100x times faster in
> memory as compared to Mapreduce. Is there any benchmark paper for the same
> which sketches out the details? Is the benchmark true for all
> applications/platforms or for a particular platform?
>
> Also has someone made a study as to what are the changes in Spark as
> compared to Mapreduce which causes the performance improvement.
>
> For example:
>
> Change A in Spark v/s Mapreduce (Multiple Spill files in Mapper) ----> %
> Reduction in the number of instructions ----> 2X times the performance
> benefit  --- > Any disadvantages like availability or conditions that the
> system should have multiple Disk I/O Channels
>
> Change B in Spark v/s Mapreduce (Difference in data consolidation in
> Reducer) --- % Reduction in the number of instructions --> 1.5X times the
> performance benefit ----> Any disadvantages like availability
>
> And so on...
>
> Also has a cost analysis been included in such a kind of study. Any case
> studies?
>
> Deepak
>
>
>
>
>
>
>
>
>
>
>
> ===========================================
>
> Two questions:
>
> 1. Is this related to the thread in any way? If not, please start a new
> one, otherwise you confuse people like myself.
>
> 2. This question is so general, do you understand the similarities and
> differences between spark and mapreduce? Learn first, then ask questions.
>
> Spark can map-reduce.
>
> Sent from my iPhone
>
> On Jun 5, 2016, at 4:37 PM, Deepak Goel <deic...@gmail.com> wrote:
>
> Hello
>
> Sorry, I am new to Spark.
>
> Spark claims it can do all that what MapReduce can do (and more!) but 10X
> times faster on disk, and 100X faster in memory. Why would then I use
> Mapreduce at all?
>
> Thanks
> Deepak
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>    --
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
>
>
>    --
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>

Re: Performance of Spark/MapReduce

Reply via email to