Hello everyone, its like a local SUM operation. any pointers, hints would be much appreciated. let me know if any additional info is required. thanks,
On Fri, Mar 15, 2013 at 10:33 PM, pranjal rajput <[email protected] > wrote: > Hi, > I am new to Pig. > I have a dataset from a time-tracker application. > It records the the time that users spend on various activities. > For example: > UserId | Activity | Tool | BeginTime | EndTime | DurationMinute > 1 | development | tool1 | 10:00 | 10:15 | 15 > 1 | development | tool2 | 10:15 | 10:30 | 15 > 1 | other | tool3 | 10:30 | 11:00 | 30 > 1 | development | tool1 | 11:00 | 11:20 | 20 > 1 | other | tool4 | 11:20 | 12:00 | 40 > 1 | development | tool1 | 12:00 | 12:15 | 15 > 2 | other | tool3 | 10:00 | 11:00 | 60 > 2 | development | tool1 | 11:00 | 11:20 | 20 > 2 | development | tool2 | 11:20 | 11:30 | 10 > > I wish to find out, un-interrupted time slots spent on > Activity=development. like this: > > UserId | Activity | SumDurationMinutes > 1 | development | 30 /*notice tht two slots are summed*/ > 1 | other | 30 > 1 | development | 20 > 1 | other | 40 > 1 | development | 15 > 2 | other | 60 > 2 | development | 30 /*again sum*/ > > How can this be done in pig? > I am open to writing a UDF for the same, or any other work around. > Thanks in anticipation, > > -- > Best Regards > Pranjal Rajput > > -- Best Regards Pranjal Rajput +91-81090-71747
