Can't without a way of ordering the data for the same key.

If you do have a way to do this (a timestamp or some such), you can group by
key, inside the foreach order the resulting group, and then run through a
UDF (you can even make this udf accumulative).

grouped = group data by key;
deltas = foreach grouped {
    ordered_tuples = order grouped by ordinal;
    generate key, FLATTEN(calculateDeltas(ordered_tuples));
}


-D


On Thu, Dec 30, 2010 at 10:12 PM, Eric Yang <[email protected]> wrote:

> Hi,
>
> What is the most efficient method to calculate delta of columns?  Consider
> this:
>
> (key1, 1, 2, 3)
> (key1, 2, 4, 5)
> (key2, 1, 2, 4)
> (key1, 3, 6, 9)
> (key2, 2, 4, 6)
>
> The expected transformation output should look like this:
>
> (key1, 1, 2, 2)
> (key1, 1, 2, 4)
> (key2, 1, 2, 2)
>
> The idea is to group by f0, and compute f1 (current value) - f1
> (previous value).  How to write this in pig?
>
> if there is a underflow value, it should reset to 0, for example:
>
> (key1, 1, 2, 3)
> (key1, 0, 0, 0)
> (key1, 2, 3, 4)
>
> The output should be:
>
> (key1, 0, 0, 0)
> (key1, 2, 3, 4)
>
> I haven't been able to find a solution from google.  Anyone?
>
> regards,
> Eric
>

Reply via email to