Hi,

Sorry for the confusion.

So let me rephrase my question.

Why does SPARK have to write the intermediate data to disk when there is a
shuffle dependency? Can't the communication happen directly just like
Giraph ?
And does data get written at reducer side as well ?

Again please feel free to correct me, in case my understanding is incorrect.

Regards,
SB


On Fri, Jan 24, 2014 at 3:44 AM, Jey Kottalam <[email protected]> wrote:

> Hi Suman,
>
> Spark does indeed do in-memory computation, and does not require
> spilling to disk after every map task. Could you explain where you
> "see that intermediate map outputs gets written to disk"? Perhaps
> you're seeing some intermediate results during a shuffle phase? In
> that case, you may want to look into the
> "spark.shuffle.consolidateFiles" option:
> https://spark.incubator.apache.org/docs/0.8.1/configuration.html
>
> -Jey
>
> On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <[email protected]>
> wrote:
> > Hi,
> >
> > I might be wrong, but need your help.
> >
> > My understanding in Giraph is that, it doesn't write the intermediate
> data
> > to disk while sending messages to different machines. But in SPARK, I see
> > that intermediate map outputs gets written to disk. Why does SPARK write
> > intermediate data to disk ?
> >
> > What happens at reducer side ? Does SPARK write the data again to disk ?
> How
> > does it differ from Hadoop MR ?
> >
> > Can't SPARK communicate everything in memory ?
> >
> > If my understanding is wrong. Please do correct me.
> >
> > Regards,
> > Suman Bharadwaj S
>

Reply via email to