Ok... I've done it :P Thanks for your help, done it through JOIN with the help of the new key field (that consist of txUser and txEpoch) that I use later to identify unique fields for GROUPing.
Sincerely, Marek M. ________________________________________ From: Marek Miglinski [[email protected]] Sent: Wednesday, September 14, 2011 9:52 AM To: [email protected] Subject: RE: Dumb question guys Thanks for your reply, I can't use JOIN and I will explain why. So here I have data... UP: 9,user1,sam1 5,user1,sam2 3,user1,sam3 9,user2,flin TX: 7,user1,wow 9,user2,pop I need to join tx with up by user and closest epoch (first field). If I do JOIN I will get (JOIN BY user): 7,user1,wow,9,user1,sam1 7,user1,wow,5,user1,sam2 7,user1,wow,3,user1,sam3 9,user2,pop,9,user2,flin Now, I can't filter the records properly in FOREACH, because I don't know if current input row is what I need, ok? So I do COGROUP and get: {(7,user1,wow)}, {(9,user1,sam1), (5,user1,sam2), (3,user1,sam2)} {(9,user2,pop)}, {(9,user2,flin)} Now I can FILTER, ORDER and LIMIT through FOREACH because I have all data in one row: recordExtract = FOREACH recordGroup { recordFiltered = FILTER up BY upEpoch < tx.txEpoch; recordOrdered = ORDER recordFiltered by upEpoch DESC; recordLimited = LIMIT recordOrdered 1; GENERATE recordLimited ; } So if I get tx.txEpoch properly I will get the desired: 7,user1,wow,5,user1,sam2 (txEpoch 5 is closest to upEpoch 7) 9,user2,pop,9,user2,flin (txEpoch 9 is closest to upEpoch 9) Do you have any clues? ________________________________________ From: Xiaomeng Wan [[email protected]] Sent: Tuesday, September 13, 2011 11:26 PM To: [email protected] Subject: Re: Dumb question guys tx is a bag, you can not use it in that way unless it is a scalar. Not sure about the logic here, but looks like you should use a join rather than a cogroup recordGroup = join up BY upInstance, tx BY txInstance; recordFiltered = FILTER recordGroup BY upEpoch < txEpoch; Shawn On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski <[email protected]> wrote: > Hey all, 4 hours of true torture, hope you will help me (the task is easy) > > up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, > upInstance:chararray, upKeyword:chararray); > tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, > txInstance:chararray, txKeyword:chararray); > recordGroup = COGROUP up BY (upInstance), tx BY (txInstance); > > recordExtract = FOREACH recordGroup { > recordFiltered = FILTER up BY upEpoch < tx.txEpoch; > recordLimited = LIMIT recordFiltered 1; > GENERATE > recordLimited > ; > } > > How do I point PIG to my tx input with txEpoch field (from recordGroup)? > tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work... > > Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. > Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: > chararray}" > > Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need > recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from > backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: > Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd > :(1314835200079,99,flin)" >
