Right now I have a pig script to rollup timeseries data, The current format of the data is in the following tab separated value list. ts service-uuid service-name type value
So the first step is to take each timestamp and snap it to a period. For 5 min rollups I use something like this: snapped = FOREACH X Generate SnapTs(300, ts) .... And then I group and average and count over that group which is great and easy. The next bit is to show the change from 0 -> 5 min so basically I want to take Point A avg and subtract it from Point B avg and divide by the timestamps to get the rate of change between the points, but I am not sure how to do that. For instance, one idea I had was to create another dataset like this previous = FOREACH snapped GENERATE $0 + 300, .... GROUP previous BY (...), snapped BY (...) But that seems like a waste, I am just having a hard time modeling that. Any help would be appreciated. Best, -- Dan Di Spaltro
