Re: Hadoop MapReduce on Spark

Sean Owen Sat, 01 Feb 2014 16:44:36 -0800

An M/R job is a one-shot job, in itself. Making it iterative is what a
higher-level controller does, by running it several times and pointing
it at the right input. That bit isn't part of M/R. So I don't think
you would accomplish this goal by implementing something *under* the
M/R API.

M/Rs still get written but I think most people serious about it are
already using higher-level APIs like Apache Crunch, or Cascading.

For those who haven't seen it, Crunch's abstraction bears a lot of
resemblance to the Spark model -- handles on remote collections. So,
*the reverse* of this suggestion (i.e. Spark-ish API on M/R) is
basically Crunch, or Scrunch if you like Scala.

I know Josh Wills has put work into getting Crunch to operate *on top
of Spark* even. That might be of interest to the original idea of
getting a possibly more familiar API, for some current Hadoop devs,
running on top of Spark. (Josh tells me it also enables a few tricks
that are hard in Spark.)

--
Sean Owen | Director, Data Science | London

On Sat, Feb 1, 2014 at 11:57 PM, nileshc <[email protected]> wrote:
> This might seem like a silly question, so please bear with me. I'm not sure
> about it myself, just would like to know if you think it's utterly
> unfeasible or not, and if it's at all worth doing.
>
> Does anyone feel like it'll be a good idea to build some sort of a library
> that allows us to write code for Spark using the usual bloated Hadoop API?
> This is for the people who want to run their existing MapReduce code (with
> NIL or minimal adjustments) with Spark to take advantage of its speed and
> its better support for iterative workflows.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Hadoop-MapReduce-on-Spark-tp1110.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Hadoop MapReduce on Spark

Reply via email to