As a suffix to what Dmitriy described - just add a project to pick the columns you need.
c = join a by filename, b by filename PARALLEL $MY_PARALLELISM;
--- Please check this syntax though with pig latin docs.
d = foreach c generate a::filename; --- Or anything else you want to pick.

if you need, just do a distinct of d's output to remove duplicates ... though this might result in more MR jobs.


- Mridul



Rob Stewart wrote:
Hi, yeah I thought so,

the only slightly confusing issue is that the output would be:
bar.dat bar.dat

? (i.e. - showing you a.filename b.filename ) ?

Rob.



2010/1/12 Dmitriy Ryaboy <[email protected]>

Rob, it's just a join.

a = load 'rel1' using FooStorage() as (id, filename);
b = load 'rel2' using FooStorage() as (id, filename);
c = join a by filename, b by filename;

Rows that don't match won't make it.
If you DO want them to make it in, you need to use "outer" for the
relations whose non-matching rows you want retained (the rest of the
fields in the resulting relation will be filled in with nulls).

Naturally, since Pig can do it, MR can do it.

-D

On Tue, Jan 12, 2010 at 2:57 PM, Rob Stewart
<[email protected]> wrote:
Hi folks,

I have a somewhat obvious question, that needs asking (for my sakes).

Pig can do Joins, I realise that. But take for example:
Table_1
----------------------
| ID | fileName |
 1     foo.dat
 2     bar.dat
 3     harry.dat

Table_2
----------------------
| ID | fileName |
 1      tom.dat
 2      bar.dat
 3      gamma.dat


SQL Syntax for conditional select:
"select t1.fileName from Table_1 t1, Table_2 t2 where t1.fileName =
t2.fileName"

Result
--------
bar.dat

How is such a query represented in Pig?
tableOne = LOAD 'input1.dat' USING PigStorage() AS (id:int,
filename:chararray);
tableTwo = LOAD 'input2.dat' USING PigStorage() AS (id:int,
filename:chararray);
[Now what??]
STORE query INTO 'Output.pig' USING PigStorage();


As a bonus question, can anybody tell me if this sort of conditional
select
query is possible writing in Java MapReduce?

thanks,

Rob Stewart


Reply via email to