Rob, it's just a join.

a = load 'rel1' using FooStorage() as (id, filename);
b = load 'rel2' using FooStorage() as (id, filename);
c = join a by filename, b by filename;

Rows that don't match won't make it.
If you DO want them to make it in, you need to use "outer" for the
relations whose non-matching rows you want retained (the rest of the
fields in the resulting relation will be filled in with nulls).

Naturally, since Pig can do it, MR can do it.

-D

On Tue, Jan 12, 2010 at 2:57 PM, Rob Stewart
<[email protected]> wrote:
> Hi folks,
>
> I have a somewhat obvious question, that needs asking (for my sakes).
>
> Pig can do Joins, I realise that. But take for example:
> Table_1
> ----------------------
> | ID | fileName |
>  1     foo.dat
>  2     bar.dat
>  3     harry.dat
>
> Table_2
> ----------------------
> | ID | fileName |
>  1      tom.dat
>  2      bar.dat
>  3      gamma.dat
>
>
> SQL Syntax for conditional select:
> "select t1.fileName from Table_1 t1, Table_2 t2 where t1.fileName =
> t2.fileName"
>
> Result
> --------
> bar.dat
>
> How is such a query represented in Pig?
> tableOne = LOAD 'input1.dat' USING PigStorage() AS (id:int,
> filename:chararray);
> tableTwo = LOAD 'input2.dat' USING PigStorage() AS (id:int,
> filename:chararray);
> [Now what??]
> STORE query INTO 'Output.pig' USING PigStorage();
>
>
> As a bonus question, can anybody tell me if this sort of conditional select
> query is possible writing in Java MapReduce?
>
> thanks,
>
> Rob Stewart
>

Reply via email to