Hi, yeah I thought so, the only slightly confusing issue is that the output would be: bar.dat bar.dat
? (i.e. - showing you a.filename b.filename ) ? Rob. 2010/1/12 Dmitriy Ryaboy <[email protected]> > Rob, it's just a join. > > a = load 'rel1' using FooStorage() as (id, filename); > b = load 'rel2' using FooStorage() as (id, filename); > c = join a by filename, b by filename; > > Rows that don't match won't make it. > If you DO want them to make it in, you need to use "outer" for the > relations whose non-matching rows you want retained (the rest of the > fields in the resulting relation will be filled in with nulls). > > Naturally, since Pig can do it, MR can do it. > > -D > > On Tue, Jan 12, 2010 at 2:57 PM, Rob Stewart > <[email protected]> wrote: > > Hi folks, > > > > I have a somewhat obvious question, that needs asking (for my sakes). > > > > Pig can do Joins, I realise that. But take for example: > > Table_1 > > ---------------------- > > | ID | fileName | > > 1 foo.dat > > 2 bar.dat > > 3 harry.dat > > > > Table_2 > > ---------------------- > > | ID | fileName | > > 1 tom.dat > > 2 bar.dat > > 3 gamma.dat > > > > > > SQL Syntax for conditional select: > > "select t1.fileName from Table_1 t1, Table_2 t2 where t1.fileName = > > t2.fileName" > > > > Result > > -------- > > bar.dat > > > > How is such a query represented in Pig? > > tableOne = LOAD 'input1.dat' USING PigStorage() AS (id:int, > > filename:chararray); > > tableTwo = LOAD 'input2.dat' USING PigStorage() AS (id:int, > > filename:chararray); > > [Now what??] > > STORE query INTO 'Output.pig' USING PigStorage(); > > > > > > As a bonus question, can anybody tell me if this sort of conditional > select > > query is possible writing in Java MapReduce? > > > > thanks, > > > > Rob Stewart > > >
