You might want to check out LinkedIn's DataFu contribution, particularly the "sessionize" UDF: http://sna-projects.com/datafu/javadoc/0.0.4/datafu/pig/sessions/Sessionize.html
_____________ Steve Bernstein VP, Analytics Rearden Commerce, Inc. +1.408.499.0961 Mobile deem.com | reardencommerce.com -----Original Message----- From: Marco Cadetg [mailto:[email protected]] Sent: Thursday, August 30, 2012 4:42 AM To: [email protected] Subject: Re: reduce continuous sessions Unfortunately it's not that simple. A = LOAD 'comb.txt' USING PigStorage(',') AS (id:chararray,start:long,end:long); B = FOREACH (GROUP A BY id) { GENERATE FLATTEN(group),MIN(A.start),MAX(A.end); } dump B (xxx,1,7) (yyy,1,7) (zzz,6,10) This is not what I want. I want only to reduce the rows / sessions if they are continues like the end of one session is the start of another. In my example that is: xxx,1,3 xxx,4,7 This is continuous as the end of the first row is the start (+1s) of the next row. Unlike this one, here the end of the first row is NOT the start of the next row... yyy,1,2 yyy,5,7 Therefore I have to keep track of sessions somehow. Cheers, -Marco On Thu, Aug 30, 2012 at 10:07 AM, Prashant Kommireddi <[email protected]>wrote: > Seems like you are looking to group by "id" and get the MIN and MAX > timestamp for each group? > > > On Thu, Aug 30, 2012 at 1:00 AM, Marco Cadetg <[email protected]> wrote: > > > Hi there, > > > > I do have some user session which look something on the following lines: > > > > id:chararray, start:long(unix timestamp), end:long(unix timestamp) > > xxx,1,3 > > xxx,4,7 > > yyy,1,2 > > yyy,5,7 > > zzz,6,7 > > zzz,7,10 > > > > I would like to to combine the rows which belong to a continues > > session e.g. in my example the result should be the following: > > xxx,1,7 > > yyy,1,2 > > yyy,5,7 > > zzz,6,10 > > > > I guess there is no way to do this directly in pig but rather by > > using a UDF. Can someone give me a pointer on how you would achieve this? > > > > Thanks, > > -Marco > > >
