Re: Cumulative value using mapreduce

Sarath Thu, 18 Oct 2012 23:03:56 -0700

Hi Yong,

Could you share more details about the HIVE UDF you have written forthis use case?As suggested, I would like to try this approach and see if thatsimplifies the solution to my requirement.


~Sarath.


On Friday 05 October 2012 12:32 AM, java8964 java8964 wrote:

I did the cumulative sum in the HIVE UDF, as one of the project for myemployer.
1) You need to decide the grouping elements for your cumulative. Forexample, an account, a department etc. In the mapper, combine theseinformation as your omit key.2) If you don't have any grouping requirement, you just want acumulative sum for all your data, then send all the data to one commonkey, so they will all go to the same reducer.3) When you calculate the cumulative sum, does the output need to havea sorting order? If so, you need to do the 2nd sorting, so the datawill be sorted as the order you want in the reducer.4) In the reducer, just do the sum, omit every value per originalrecord (Not per key).
I will suggest you do this in the UDF of HIVE, as it is much easy, ifyou can build a HIVE schema on top of your data.
Yong

------------------------------------------------------------------------
From: tdunn...@maprtech.com
Date: Thu, 4 Oct 2012 18:52:09 +0100
Subject: Re: Cumulative value using mapreduce
To: user@hadoop.apache.org

Bertrand is almost right.
The only difference is that the original poster asked about cumulativesum.
This can be done in reducer exactly as Bertrand described except fortwo points that make it different from word count:
a) you can't use a combiner
b) the output of the program is as large as the input so it will havedifferent performance characteristics than aggregation programs likewordcount.
Bertrand's key recommendation to go read a book is the most importantadvice.
On Thu, Oct 4, 2012 at 5:20 PM, Bertrand Dechoux <decho...@gmail.com<mailto:decho...@gmail.com>> wrote:
    Hi,

    It sounds like a
    1) group information by account
    2) compute sum per account

    If that not the case, you should precise a bit more about your
    context.

    This computing looks like a small variant of wordcount. If you do
    not know how to do it, you should read books about Hadoop
    MapReduce and/or online tutorial. Yahoo's is old but still a nice
    read to begin with : http://developer.yahoo.com/hadoop/tutorial/

    Regards,

    Bertrand


    On Thu, Oct 4, 2012 at 3:58 PM, Sarath
    <sarathchandra.jos...@algofusiontech.com
    <mailto:sarathchandra.jos...@algofusiontech.com>> wrote:

        Hi,

        I have a file which has some financial transaction data. Each
        transaction will have amount and a credit/debit indicator.
        I want to write a mapreduce program which computes cumulative
        credit & debit amounts at each record
        and append these values to the record before dumping into the
        output file.

        Is this possible? How can I achieve this? Where should i put
        the logic of computing the cumulative values?

        Regards,
        Sarath.
--Bertrand Dechoux

Re: Cumulative value using mapreduce

Reply via email to