Thanks for your reply,
I can't use JOIN and I will explain why. So here I have data...
UP:
9,user1,sam1
5,user1,sam2
3,user1,sam3
9,user2,flin
TX:
7,user1,wow
9,user2,pop
I need to join tx with up by user and closest epoch (first field). If I do JOIN
I will get (JOIN BY user):
7,user1,wow,9,user1,sam1
7,user1,wow,5,user1,sam2
7,user1,wow,3,user1,sam3
9,user2,pop,9,user2,flin
Now, I can't filter the records properly in FOREACH, because I don't know if
current input row is what I need, ok?
So I do COGROUP and get:
{(7,user1,wow)}, {(9,user1,sam1), (5,user1,sam2), (3,user1,sam2)}
{(9,user2,pop)}, {(9,user2,flin)}
Now I can FILTER, ORDER and LIMIT through FOREACH because I have all data in
one row:
recordExtract = FOREACH recordGroup {
recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
recordOrdered = ORDER recordFiltered by upEpoch DESC;
recordLimited = LIMIT recordOrdered 1;
GENERATE
recordLimited
;
}
So if I get tx.txEpoch properly I will get the desired:
7,user1,wow,5,user1,sam2 (txEpoch 5 is closest to upEpoch 7)
9,user2,pop,9,user2,flin (txEpoch 9 is closest to upEpoch 9)
Do you have any clues?
________________________________________
From: Xiaomeng Wan [[email protected]]
Sent: Tuesday, September 13, 2011 11:26 PM
To: [email protected]
Subject: Re: Dumb question guys
tx is a bag, you can not use it in that way unless it is a scalar. Not
sure about the logic here, but looks like you should use a join rather
than a cogroup
recordGroup = join up BY upInstance, tx BY txInstance;
recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;
Shawn
On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski <[email protected]> wrote:
> Hey all, 4 hours of true torture, hope you will help me (the task is easy)
>
> up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long,
> upInstance:chararray, upKeyword:chararray);
> tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long,
> txInstance:chararray, txKeyword:chararray);
> recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);
>
> recordExtract = FOREACH recordGroup {
> recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
> recordLimited = LIMIT recordFiltered 1;
> GENERATE
> recordLimited
> ;
> }
>
> How do I point PIG to my tx input with txEpoch field (from recordGroup)?
> tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...
>
> Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing.
> Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword:
> chararray}"
>
> Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need
> recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from
> backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
> Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd
> :(1314835200079,99,flin)"
>