What you are suggesting seems to be a fundamentally single-threaded process
(well, it can be parallelized, but it's not pretty and involves multiple
passes), so it's not a good fit for the map-reduce paradigm (how would you
do accumulative totals for 25 billion entries?).  Pig tends to avoid
implementing methods that restrict scaling computations in this way. Your
idea of streaming through a script would work; you could also write an
accumulative UDF and use it on the result of doing a GROUP ALL on your
relation.

-Dmitriy

On Fri, Dec 17, 2010 at 11:31 AM, Kris Coward <[email protected]> wrote:

> Hello,
>
> Is there some sort of mechanism by which I could cause a value to
> accumulate within a relation? What I'd like to do is something along the
> lines of having a long called accumulator, and an outer bag called
> hourlyTotals with a schema of (hour:int, collected:int)
>
> accumulator = 0L; -- I know this line doesn't work
> ORDER hourlyTotals BY collected;
> cumulativeTotals = FOREACH hourlyTotals {
>                        accumulator += collected;
>                        GENERATE day, accumulator AS collected;
>                        }
>
> Could something like this be made to work? Is there something similar that
> I can do instead? Do I just need to pipe the relation through an
> external script to get what I want?
>
> Thanks,
> Kris
>
> --
> Kris Coward                                     http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
>

Reply via email to