Hi,

Lets say I have a data set of units of output per worker per second that's
in chronological order for a whole day

Example:
2013-07-26T14:00:00, Joe, 50
2013-07-26T14:10:00, Jane,60
2013-07-26T14:15:00, Joe, 55
2013-07-26T14:20:00, Jane,60

I create the data set above by loading a larger data set and getting these
three attributes in a relation.

Now, I want to count output per user per unit of time period, say every ten
minutes but as a rolling count with a window that moves by the minute. The
pseudo-code would be something along the lines of:

-----------xxxxxxxxxxxxxxxxx-------------------
A = LOAD 'input' AS (timestamp, worker, output);

ts1=0
ts2=1440 (24 hours x 60 mins/hr)

for (i=ts1, i<=(ts2-10), i++)
   {
     R1 = FILTER A BY timestamp > $i AND timestamp < ($i + 10);
     GRP = R1 BY (worker, output);
     CNT = FOREACH GRP GENERATE group, COUNT(GRP);
     DUMP CNT;
    }
-----------xxxxxxxxxxxxxxxxx-------------------

But I can't figure out how to do this simple iteration in pig using
FOREACH. I think the answer is create a relation that has a data set that
has all the minutes in a day {0.....1440} and then iterate over it?

Sorry if my Pig terminology isn't correct. I have been using it only for a
day now.

Any pointers will be highly appreciated.

TIA.

Reply via email to