Ah… good point.  I should have just done an explain :)

On Mon, Aug 29, 2011 at 1:56 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Better than reading source, try the explain plan.
>
> [dmitriy@host~]$ pig -x local
> 2011-08-29 20:49:02,137 [main] INFO  org.apache.pig.Main - Logging error
> messages to: /var/log/pig/pig_1314650942119.log
> 2011-08-29 20:49:02,284 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: file:///
> grunt> l = load '/etc/hosts' as (foo);
> grunt> l2 = load '/etc/hosts' as (foo);
> grunt> unioned = union l, l2;
> grunt> g = group unioned by foo;
> grunt> explain g;
>
> ... snip ...
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-34
> Map Plan
> g: Local Rearrange[tuple]{bytearray}(false) - scope-29
> |   |
> |   Project[bytearray][0] - scope-30
> |
> |---unioned: Union[bag] - scope-26
>    |
>    |---l: Load(/etc/hosts:org.apache.pig.builtin.PigStorage) - scope-24
>    |
>    |---l2: Load(/etc/hosts:org.apache.pig.builtin.PigStorage) -
> scope-25--------
> Reduce Plan
> g: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-31
> |
> |---g: Package[tuple]{bytearray} - scope-28--------
> Global sort: false
> ----------------
>
> Looks like it works the way you'd expect it to work -- it just reads from
> two sources and tries to apply the same schema to them.
>
> If you play around with the script you are describing, you will discover
> other fun things, such as the fact that it's smart enough to apply filters
> before unioning even if your script has a single filter on the unioned
> relation.
>
> D
>
> On Mon, Aug 29, 2011 at 1:31 PM, Kevin Burton <[email protected]> wrote:
>
> > How is UNION implemented?
> >
> > Does it read from two source files or does it create a temporary file by
> > reading the N source files/relations and then writing a new temp file
> which
> > is then read from?
> >
> > I could probably spend an hour looking through the source to figure this
> > out
> > but I figured I would just ask.
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> >
> > Location: *San Francisco, CA*
> > Skype: *burtonator*
> >
> > Skype-in: *(415) 871-0687*
> >
>



-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Reply via email to