Rob,
I get confused how the fields are named sometimes, especially after a
series of joins and groups. The describe command helps (and the
illustrate command, which unfortunately doesn't support some
operators).

On Tue, Jan 12, 2010 at 3:27 PM, Mridul Muralidharan
<[email protected]> wrote:
>
> As a suffix to what Dmitriy described - just add a project to pick the
> columns you need.
> c = join a by filename, b by filename PARALLEL $MY_PARALLELISM;
> --- Please check this syntax though with pig latin docs.
> d = foreach c generate a::filename; --- Or anything else you want to pick.
>
> if you need, just do a distinct of d's output to remove duplicates ...
> though this might result in more MR jobs.
>
>
> - Mridul
>
>
>
> Rob Stewart wrote:
>>
>> Hi, yeah I thought so,
>>
>> the only slightly confusing issue is that the output would be:
>> bar.dat bar.dat
>>
>> ? (i.e. - showing you a.filename b.filename ) ?
>>
>> Rob.
>>
>>
>>
>> 2010/1/12 Dmitriy Ryaboy <[email protected]>
>>
>>> Rob, it's just a join.
>>>
>>> a = load 'rel1' using FooStorage() as (id, filename);
>>> b = load 'rel2' using FooStorage() as (id, filename);
>>> c = join a by filename, b by filename;
>>>
>>> Rows that don't match won't make it.
>>> If you DO want them to make it in, you need to use "outer" for the
>>> relations whose non-matching rows you want retained (the rest of the
>>> fields in the resulting relation will be filled in with nulls).
>>>
>>> Naturally, since Pig can do it, MR can do it.
>>>
>>> -D
>>>
>>> On Tue, Jan 12, 2010 at 2:57 PM, Rob Stewart
>>> <[email protected]> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> I have a somewhat obvious question, that needs asking (for my sakes).
>>>>
>>>> Pig can do Joins, I realise that. But take for example:
>>>> Table_1
>>>> ----------------------
>>>> | ID | fileName |
>>>>  1     foo.dat
>>>>  2     bar.dat
>>>>  3     harry.dat
>>>>
>>>> Table_2
>>>> ----------------------
>>>> | ID | fileName |
>>>>  1      tom.dat
>>>>  2      bar.dat
>>>>  3      gamma.dat
>>>>
>>>>
>>>> SQL Syntax for conditional select:
>>>> "select t1.fileName from Table_1 t1, Table_2 t2 where t1.fileName =
>>>> t2.fileName"
>>>>
>>>> Result
>>>> --------
>>>> bar.dat
>>>>
>>>> How is such a query represented in Pig?
>>>> tableOne = LOAD 'input1.dat' USING PigStorage() AS (id:int,
>>>> filename:chararray);
>>>> tableTwo = LOAD 'input2.dat' USING PigStorage() AS (id:int,
>>>> filename:chararray);
>>>> [Now what??]
>>>> STORE query INTO 'Output.pig' USING PigStorage();
>>>>
>>>>
>>>> As a bonus question, can anybody tell me if this sort of conditional
>>>
>>> select
>>>>
>>>> query is possible writing in Java MapReduce?
>>>>
>>>> thanks,
>>>>
>>>> Rob Stewart
>>>>
>
>

Reply via email to