Re: Re: Re: Re: Filtering by value in Reducer

Peter Ruch Tue, 12 May 2015 07:24:30 -0700

Hi,

No, I did not create any custom logs, I was only looking through the"standard" logs.I just started out with Hadoop and did not think of explicitly loggingthat part of the code,as I thought that I am simply missing a small detail that someone of youmight spot.


But I will definitely look into the custom logging and post my findings.

@ Shahab and Drake: Thank you very much for your help.


Best,
Peter


On 12.05.2015 14:57, Shahab Yunus wrote:

Have you tried explicitly printing or logging in you reducer aroundthe code that compares and then outputs the values? Maybe that willgive you a clue that what is happening? Debug the threshold value thatyou get in the reducer and whether that is what you have set or not(in case of when you set it to greater than -1)?

You can also try to use compare method for comparing IntWritablesthough I doubt that would make any difference.


Shahab

On May 12, 2015 8:17 AM, "Peter Ruch" <[email protected]<mailto:[email protected]>> wrote:


    Hi,

    I already skimmed through the logs but I could not find anything
    special.

    I am just really confused why I am having this problem.

    If the Iterable<...> for a specific key contains all of the
    observed values - and it seems to do so
    otherwise the program wouldn't work correctly in the standard case
    with [[ threshold = -1 ]] -
    it should also work when I only write the key-value pairs to the
    output file that suffice the condition [[ sum > threshold ]].

    Did I miss something? Maybe I have to handle these cases in a
    specific way, but I did not find anything about that online.


    Thank you for your help,

    Peter



    On 12.05.2015 12:35, Drake민영근 wrote:

    Hi, Peter

    The missing records, they are just gone without no logs? How
    about your reduce tasks logs?

    Thanks

    Drake 민영근 Ph.D
    kt NexR

    On Tue, May 12, 2015 at 5:18 AM, Peter Ruch
    <[email protected] <mailto:[email protected]>> wrote:

        Hello,

        sum and threshold are both Integers.
        for the threshold variable I first add a new resource to the
        configuration - conf.addResource( ... );

        later I get the threshold value from the configuration.

        Code
        #####################################

        private int threshold;

        public void setup( Context context ) {

                  Configuration conf = context.getConfiguration();
                  threshold = conf.getInt( "threshold", -1 );

        }

        #####################################


        Best,
        Peter



        On 11.05.2015 19:26, Shahab Yunus wrote:

        What is the type of the threshold variable? sum I believe is
        a Java int.

        Regards,
        Shahab

        On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
        <[email protected] <mailto:[email protected]>>
        wrote:

            Hi,

            I am currently playing around with Hadoop and have some
            problems when trying to filter in the Reducer.

            I extended the WordCount v1.0 example from the 2.7
            MapReduce Tutorial with some additional functionality
            and added the possibility to filter by the specific
            value of each key - e.g. only output the key-value pairs
            where [[ value > threshold ]].

            Filtering Code in Reducer
            #####################################

            for (IntWritable val : values) {
                 sum += val.get();
            }
            if ( sum > threshold ) {
                 result.set(sum);
                 context.write(key, result);
            }

            #####################################

            For threshold smaller any value the above code works as
            expected and the output contains all key-value pairs.
            If I increase the threshold to 1 some pairs are missing
            in the output although the respective value would be
            larger than the threshold.

            I tried to work out the error myself, but I could not
            get it to work as intended. I use the exact Tutorial
            setup with Oracle JDK 8
            on a CentOS 7 machine.

            As far as I understand the respective Iterable<...>  in
            the Reducer already contains all the observed values for
            a specific key.
            Why is it possible that I am missing some of these
            key-value pairs then? It only fails in very few cases.
            The input file is pretty large - 250 MB -
            so I also tried to increase the memory for the mapping
            and reduction steps but it did not help ( tried a lot of
            different stuff without success )

            Maybe someone already experienced similar problems / is
            more experienced than I am.


            Thank you,

            Peter

Re: Re: Re: Re: Filtering by value in Reducer

Reply via email to