It really depends on what your 'first' means (first by what?)
if you just want a single row, you can group the filter results like this:

a = group filtered by (id1, id2);
b = foreach a { x = limit filtered 1; generate FLATTEN(x) as (...); }

if "first" implies some kind of order, you can change "x=limit
filtered 1;"  to "y = order filtered by xxx; x = limit y 1; "

Shawn

On Mon, Sep 12, 2011 at 4:23 PM, Marek Miglinski <[email protected]> wrote:
> Any clues anyone? I still didn't get anything myself, thinking...
>
>
>
> Sincerely,
> Marek M.
>
> -----Original Message-----
> From: yonghu [mailto:[email protected]]
> Sent: Monday, September 12, 2011 10:21 PM
> To: [email protected]
> Subject: Re: JOINing two inputs
>
> Sorry, I didn't understand you right. I didn't think just use Pig operator 
> can finish this problem. You can first use cogroup operator to group the two 
> inputs together. Then apply a UDF to each tuple.
>
> On Mon, Sep 12, 2011 at 5:35 PM, Marek Miglinski <[email protected]>wrote:
>
>> Thanks for fast reply ;)
>>
>> Ok, I've done this:
>> recordJoined = JOIN record1 BY (game_id, user_id), record2 BY
>> (game_id, user_id);
>>
>> Now I have:
>> record1.epoch::50, record1.game_id::434, record1.user_id::990,
>> record2.epoch::67, record2.param1::pop record1.epoch::50,
>> record1.game_id::434, record1.user_id::990, record2.epoch::43,
>> record2.param1::wow record1.epoch::50, record1.game_id::434,
>> record1.user_id::990, record2.epoch::42, record2.param1::slow
>> record1.epoch::50, record1.game_id::434, record1.user_id::990,
>> record2.epoch::23, record2.param1::fast (Other data)
>> record1.epoch::67, record1.game_id::564, record1.user_id::889,
>> record2.epoch::44, record2.param1::pop ...
>>
>> Now what?
>> I can do this:
>> recordFiltered = FILTER recordJoined BY record1::epoch >=
>> record2::epoch;
>>
>> It will give me:
>> record1.epoch::50, record1.game_id::434, record1.user_id::990,
>> record2.epoch::43, record2.param1::wow record1.epoch::50,
>> record1.game_id::434, record1.user_id::990, record2.epoch::42,
>> record2.param1::slow record1.epoch::50, record1.game_id::434,
>> record1.user_id::990, record2.epoch::23, record2.param1::fast (Other
>> data) record1.epoch::67, record1.game_id::564, record1.user_id::889,
>> record2.epoch::44, record2.param1::pop ...
>>
>> Still not what I want, I need:
>> record1.epoch::50, record1.game_id::434, record1.user_id::990,
>> record2.epoch::43, record2.param1::wow (Other data) record1.epoch::67,
>> record1.game_id::564, record1.user_id::889, record2.epoch::44,
>> record2.param1::pop ...
>>
>>
>>
>> Sincerely,
>> Marek M.
>>
>> ________________________________________
>> From: yonghu [[email protected]]
>> Sent: Monday, September 12, 2011 5:49 PM
>> To: [email protected]
>> Subject: Re: JOINing two inputs
>>
>> I think you can first use join and then for each tuple using filter.
>>
>> On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[email protected]
>> >wrote:
>>
>> > Hi,
>> >
>> > I have a serious task to finish, hope somebody will help me... I
>> > have two inputs with data:
>> >
>> > record1:
>> > epoch,
>> > game_id,
>> > user_id,
>> > other data
>> >
>> > record2:
>> > epoch,
>> > game_id,
>> > user_id,
>> > other data
>> >
>> > Now I need to JOIN record1 with record2 BY game_id, oper_id,
>> > user_id, epoch. BUT! epoch in record2 must be FIRST found data and
>> > it should be < than epoch in record1.
>> >
>> > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY
>> > (game_id, user_id); + add something like... CLOSEST(WHERE
>> > record1::epoch < record2::epoch);
>> >
>> > So for example:
>> >
>> > record1:
>> > epoch::50
>> > game_id::434
>> > user_id::990
>> >
>> > record2:
>> > epoch::67
>> > game_id::434
>> > user_id::990
>> > param1::pop
>> >
>> > record2:
>> > epoch::43
>> > game_id::434
>> > user_id::990
>> > param1::wow
>> >
>> > record2:
>> > epoch::42
>> > game_id::434
>> > user_id::990
>> > param1::slow
>> >
>> > record2:
>> > epoch::23
>> > game_id::434
>> > user_id::990
>> > param1::fast
>> >
>> >
>> > The result should be - record1.epoch::50, record1.game_id::434,
>> > record1.user_id::990, record2.epoch::43, record2.param1::wow and ...
>> >
>> > Is it possible to accomplish through PIG? Using JOIN or using FOREACH?
>> >
>> >
>> >
>> > Sincerely,
>> > Marek M.
>> >
>> >
>> >
>>
>

Reply via email to