You are right in my example, there should be a timestamp column. Thanks, I will look into writing the UDF.
regards, Eric On Fri, Dec 31, 2010 at 1:16 AM, Dmitriy Ryaboy <[email protected]> wrote: > Can't without a way of ordering the data for the same key. > > If you do have a way to do this (a timestamp or some such), you can group by > key, inside the foreach order the resulting group, and then run through a > UDF (you can even make this udf accumulative). > > grouped = group data by key; > deltas = foreach grouped { > ordered_tuples = order grouped by ordinal; > generate key, FLATTEN(calculateDeltas(ordered_tuples)); > } > > > -D > > > On Thu, Dec 30, 2010 at 10:12 PM, Eric Yang <[email protected]> wrote: > >> Hi, >> >> What is the most efficient method to calculate delta of columns? Consider >> this: >> >> (key1, 1, 2, 3) >> (key1, 2, 4, 5) >> (key2, 1, 2, 4) >> (key1, 3, 6, 9) >> (key2, 2, 4, 6) >> >> The expected transformation output should look like this: >> >> (key1, 1, 2, 2) >> (key1, 1, 2, 4) >> (key2, 1, 2, 2) >> >> The idea is to group by f0, and compute f1 (current value) - f1 >> (previous value). How to write this in pig? >> >> if there is a underflow value, it should reset to 0, for example: >> >> (key1, 1, 2, 3) >> (key1, 0, 0, 0) >> (key1, 2, 3, 4) >> >> The output should be: >> >> (key1, 0, 0, 0) >> (key1, 2, 3, 4) >> >> I haven't been able to find a solution from google. Anyone? >> >> regards, >> Eric >> >
