Hi,
Lets say I have a data set of units of output per worker per second that's
in chronological order for a whole day
Example:
2013-07-26T14:00:00, Joe, 50
2013-07-26T14:10:00, Jane,60
2013-07-26T14:15:00, Joe, 55
2013-07-26T14:20:00, Jane,60
I create the data set above by loading a larger data set and getting these
three attributes in a relation.
Now, I want to count output per user per unit of time period, say every ten
minutes but as a rolling count with a window that moves by the minute. The
pseudo-code would be something along the lines of:
-----------xxxxxxxxxxxxxxxxx-------------------
A = LOAD 'input' AS (timestamp, worker, output);
ts1=0
ts2=1440 (24 hours x 60 mins/hr)
for (i=ts1, i<=(ts2-10), i++)
{
R1 = FILTER A BY timestamp > $i AND timestamp < ($i + 10);
GRP = R1 BY (worker, output);
CNT = FOREACH GRP GENERATE group, COUNT(GRP);
DUMP CNT;
}
-----------xxxxxxxxxxxxxxxxx-------------------
But I can't figure out how to do this simple iteration in pig using
FOREACH. I think the answer is create a relation that has a data set that
has all the minutes in a day {0.....1440} and then iterate over it?
Sorry if my Pig terminology isn't correct. I have been using it only for a
day now.
Any pointers will be highly appreciated.
TIA.