Rob, it's just a join. a = load 'rel1' using FooStorage() as (id, filename); b = load 'rel2' using FooStorage() as (id, filename); c = join a by filename, b by filename;
Rows that don't match won't make it. If you DO want them to make it in, you need to use "outer" for the relations whose non-matching rows you want retained (the rest of the fields in the resulting relation will be filled in with nulls). Naturally, since Pig can do it, MR can do it. -D On Tue, Jan 12, 2010 at 2:57 PM, Rob Stewart <[email protected]> wrote: > Hi folks, > > I have a somewhat obvious question, that needs asking (for my sakes). > > Pig can do Joins, I realise that. But take for example: > Table_1 > ---------------------- > | ID | fileName | > 1 foo.dat > 2 bar.dat > 3 harry.dat > > Table_2 > ---------------------- > | ID | fileName | > 1 tom.dat > 2 bar.dat > 3 gamma.dat > > > SQL Syntax for conditional select: > "select t1.fileName from Table_1 t1, Table_2 t2 where t1.fileName = > t2.fileName" > > Result > -------- > bar.dat > > How is such a query represented in Pig? > tableOne = LOAD 'input1.dat' USING PigStorage() AS (id:int, > filename:chararray); > tableTwo = LOAD 'input2.dat' USING PigStorage() AS (id:int, > filename:chararray); > [Now what??] > STORE query INTO 'Output.pig' USING PigStorage(); > > > As a bonus question, can anybody tell me if this sort of conditional select > query is possible writing in Java MapReduce? > > thanks, > > Rob Stewart >
