Unfortunately it's not that simple.

A = LOAD 'comb.txt' USING PigStorage(',') AS
(id:chararray,start:long,end:long);
B = FOREACH (GROUP A BY id) { GENERATE
FLATTEN(group),MIN(A.start),MAX(A.end); }
dump B
(xxx,1,7)
(yyy,1,7)
(zzz,6,10)

This is not what I want. I want only to reduce the rows / sessions if they
are continues like the end of one session is the start of another. In my
example that is:
xxx,1,3
xxx,4,7

This is continuous as the end of the first row is the start (+1s) of the
next row.

Unlike this one, here the end of the first row is NOT the start of the next
row...
yyy,1,2
yyy,5,7

Therefore I have to keep track of sessions somehow.

Cheers,
-Marco


On Thu, Aug 30, 2012 at 10:07 AM, Prashant Kommireddi
<[email protected]>wrote:

> Seems like you are looking to group by "id" and get the MIN and MAX
> timestamp for each group?
>
>
> On Thu, Aug 30, 2012 at 1:00 AM, Marco Cadetg <[email protected]> wrote:
>
> > Hi there,
> >
> > I do have some user session which look something on the following lines:
> >
> > id:chararray, start:long(unix timestamp), end:long(unix timestamp)
> > xxx,1,3
> > xxx,4,7
> > yyy,1,2
> > yyy,5,7
> > zzz,6,7
> > zzz,7,10
> >
> > I would like to to combine the rows which belong to a continues session
> > e.g. in my example the result should be the following:
> > xxx,1,7
> > yyy,1,2
> > yyy,5,7
> > zzz,6,10
> >
> > I guess there is no way to do this directly in pig but rather by using a
> > UDF. Can someone give me a pointer on how you would achieve this?
> >
> > Thanks,
> > -Marco
> >
>

Reply via email to