Re: Is Intermediate data written to disk?

bharath v Wed, 03 Feb 2010 04:18:39 -0800

Dimitry,

Suppose the command is like (A on a , B on b1 and B on b2 , C on c) .. Then
it requires storing the intermediate join of AB on to disk right?


Thanks

On Wed, Feb 3, 2010 at 5:18 PM, Dmitriy Ryaboy <[email protected]> wrote:

> if you explicitly join 3 or more relations with a single command ("d =
> join a on id, b on id, c on id;"), a and b will be buffered for each
> key, while c, the rightmost relation, will be streamed.
>
> This is on a per-reducer basis. There is of course a whole lot of IO
> going on for getting from the Mappers to Reducers, but none of it is
> the intermediate result of joining A to B.
>
> -Dmitriy
>
> On Tue, Feb 2, 2010 at 10:52 PM, bharath v
> <[email protected]> wrote:
> > Hi ,
> >
> > I have a small doubt in how pig handles queries containing join of more
> than
> > 2 tables .
> >
> > Suppose we have 3 tables A,B,C .. and the plan is  "((AB)C)" ..
> > We can join A,B in a map reduce job and join the resultant table with
> "C". I
> > have a doubt whether the result of "AB" is stored to disk before joining
> > with C or is it streamed directly to join with C (I dont know how , just
> a
> > guess) .
> >
> > Any help is appreciated ,
> >
> > Thanks
> >
>

Re: Is Intermediate data written to disk?

Reply via email to