Better than reading source, try the explain plan.

[dmitriy@host~]$ pig -x local
2011-08-29 20:49:02,137 [main] INFO  org.apache.pig.Main - Logging error
messages to: /var/log/pig/pig_1314650942119.log
2011-08-29 20:49:02,284 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///
grunt> l = load '/etc/hosts' as (foo);
grunt> l2 = load '/etc/hosts' as (foo);
grunt> unioned = union l, l2;
grunt> g = group unioned by foo;
grunt> explain g;

... snip ...
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-34
Map Plan
g: Local Rearrange[tuple]{bytearray}(false) - scope-29
|   |
|   Project[bytearray][0] - scope-30
|
|---unioned: Union[bag] - scope-26
    |
    |---l: Load(/etc/hosts:org.apache.pig.builtin.PigStorage) - scope-24
    |
    |---l2: Load(/etc/hosts:org.apache.pig.builtin.PigStorage) -
scope-25--------
Reduce Plan
g: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-31
|
|---g: Package[tuple]{bytearray} - scope-28--------
Global sort: false
----------------

Looks like it works the way you'd expect it to work -- it just reads from
two sources and tries to apply the same schema to them.

If you play around with the script you are describing, you will discover
other fun things, such as the fact that it's smart enough to apply filters
before unioning even if your script has a single filter on the unioned
relation.

D

On Mon, Aug 29, 2011 at 1:31 PM, Kevin Burton <[email protected]> wrote:

> How is UNION implemented?
>
> Does it read from two source files or does it create a temporary file by
> reading the N source files/relations and then writing a new temp file which
> is then read from?
>
> I could probably spend an hour looking through the source to figure this
> out
> but I figured I would just ask.
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: *San Francisco, CA*
> Skype: *burtonator*
>
> Skype-in: *(415) 871-0687*
>

Reply via email to