Right now I have a pig script to rollup timeseries data,

The current format of the data is in the following tab separated value list.
ts service-uuid service-name type value

So the first step is to take each timestamp and snap it to a period.
For 5 min rollups I use something like this:
snapped = FOREACH X Generate SnapTs(300, ts) ....

And then I group and average and count over that group which is great
and easy.  The next bit is to show the change from 0 -> 5 min  so
basically I want to take Point A avg and subtract it from Point B avg
and divide by the timestamps to get the rate of change between the
points, but I am not sure how to do that.  For instance, one idea I
had was to create another dataset like this

previous = FOREACH snapped GENERATE $0 + 300, ....

GROUP previous BY (...), snapped BY (...)

But that seems like a waste, I am just having a hard time modeling
that.  Any help would be appreciated.

Best,

-- 
Dan Di Spaltro

Reply via email to