Vanilla map/reduce does not expose it: but hive on top of map/reduce has superior partitioning (and bucketing) support to Spark.
2015-06-28 13:44 GMT-07:00 Koert Kuipers <ko...@tresata.com>: > spark is partitioner aware, so it can exploit a situation where 2 datasets > are partitioned the same way (for example by doing a map-side join on > them). map-red does not expose this. > > On Sun, Jun 28, 2015 at 12:13 PM, YaoPau <jonrgr...@gmail.com> wrote: > >> I've heard "Spark is not just MapReduce" mentioned during Spark talks, >> but it >> seems like every method that Spark has is really doing something like (Map >> -> Reduce) or (Map -> Map -> Map -> Reduce) etc behind the scenes, with >> the >> performance benefit of keeping RDDs in memory between stages. >> >> Am I wrong about that? Is Spark doing anything more efficiently than a >> series of Maps followed by a Reduce in memory? What methods does Spark >> have >> that can't easily be mapped (with somewhat similar efficiency) to Map and >> Reduce in memory? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/What-does-Spark-is-not-just-MapReduce-mean-Isn-t-every-Spark-job-a-form-of-MapReduce-tp23518.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >