I don't think that quote is true in general. Given a map-only task, or a map+shuffle+reduce, I'd expect MapReduce to be the same speed or a little faster. It is the simpler, lower-level, narrower mechanism. It's all limited by how fast you can read/write data and execute the user code.
There's a big difference if you're executing a many-stage pipeline where a chain of M/R jobs would be writing back to disk each time, but a Spark job could stay in memory. This is most of the source of that quote. I think the argument for Spark is 95% that it's a higher-level API. Writing M/R takes 10s of times more code. But, I think people were already using things like Crunch on M/R before Spark anyway. Spark still adds value with things like a DataFrame API; if you're doing work that fits its constraints, then it can optimize more under the hood. M/R is just a job scheduler for user code. On Mon, Jun 6, 2016 at 4:12 AM, Deepak Goel <deic...@gmail.com> wrote: > > Hey > > Namaskara~Nalama~Guten Tag~Bonjour > > > Sorry about that (The question might still be general as I am new to > Spark). > > My question is: > > Spark claims to be 10x times faster on disk and 100x times faster in > memory as compared to Mapreduce. Is there any benchmark paper for the same > which sketches out the details? Is the benchmark true for all > applications/platforms or for a particular platform? > > Also has someone made a study as to what are the changes in Spark as > compared to Mapreduce which causes the performance improvement. > > For example: > > Change A in Spark v/s Mapreduce (Multiple Spill files in Mapper) ----> % > Reduction in the number of instructions ----> 2X times the performance > benefit --- > Any disadvantages like availability or conditions that the > system should have multiple Disk I/O Channels > > Change B in Spark v/s Mapreduce (Difference in data consolidation in > Reducer) --- % Reduction in the number of instructions --> 1.5X times the > performance benefit ----> Any disadvantages like availability > > And so on... > > Also has a cost analysis been included in such a kind of study. Any case > studies? > > Deepak > > > > > > > > > > > > =========================================== > > Two questions: > > 1. Is this related to the thread in any way? If not, please start a new > one, otherwise you confuse people like myself. > > 2. This question is so general, do you understand the similarities and > differences between spark and mapreduce? Learn first, then ask questions. > > Spark can map-reduce. > > Sent from my iPhone > > On Jun 5, 2016, at 4:37 PM, Deepak Goel <deic...@gmail.com> wrote: > > Hello > > Sorry, I am new to Spark. > > Spark claims it can do all that what MapReduce can do (and more!) but 10X > times faster on disk, and 100X faster in memory. Why would then I use > Mapreduce at all? > > Thanks > Deepak > > Hey > > Namaskara~Nalama~Guten Tag~Bonjour > > > -- > Keigu > > Deepak > 73500 12833 > www.simtree.net, dee...@simtree.net > deic...@gmail.com > > LinkedIn: www.linkedin.com/in/deicool > Skype: thumsupdeicool > Google talk: deicool > Blog: http://loveandfearless.wordpress.com > Facebook: http://www.facebook.com/deicool > > "Contribute to the world, environment and more : > http://www.gridrepublic.org > " > > > > -- > Keigu > > Deepak > 73500 12833 > www.simtree.net, dee...@simtree.net > deic...@gmail.com > > LinkedIn: www.linkedin.com/in/deicool > Skype: thumsupdeicool > Google talk: deicool > Blog: http://loveandfearless.wordpress.com > Facebook: http://www.facebook.com/deicool > > "Contribute to the world, environment and more : > http://www.gridrepublic.org > " >