Re: Map output records/reducer input records mismatch

Vyacheslav Zholudev Wed, 17 Aug 2011 01:38:59 -0700

Hi Scott,

The pair types are Pair<CharSequence, SomeSpecificJavaClass>, but in essence 
when I call "collect()" then I always provide a java.lang.String object.


The reduce method is
reduce(CharSequence key, Iterable<SomeSpecificJavaClass> values, .....)

Some more detailed info:
the jobtracker and namenode run with:
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)

the tasktrackers and datanodes run with:
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Hadoop version is:
cdh3u1

Thanks for suggestions,
Vyacheslav




On Aug 17, 2011, at 3:56 AM, Scott Carey wrote:

> On 8/16/11 3:56 PM, "Vyacheslav Zholudev" <[email protected]>
> wrote:
> 
>> Hi, Scott,
>> 
>> thanks for your reply.
>> 
>>> What Avro version is this happening with? What JVM version?
>> 
>> We are using Avro 1.5.1 and Sun JDK 6, but the exact version I will have
>> to look up.
>> 
>>> 
>>> On a hunch, have you tried adding -XX:-UseLoopPredicate to the JVM args
>>> if
>>> it is Sun and JRE 6u21 or later? (some issues in loop predicates affect
>>> Java 6 too, just not as many as the recent news on Java7).
>>> 
>>> Otherwise, it may likely be the same thing as AVRO-782.  Any extra
>>> information related to that issue would be welcome.
>> 
>> I will have to collect it. In the meanwhile, do you have any reasonable
>> explanations of the issue besides it being something like AVRO-782?
> 
> What is your key type (map output schema, first type argument of Pair)?
> Is your key a Utf8 or String?  I don't have a reasonable explanation at
> this point, I haven't looked into it in depth with a good reproducible
> case.  I have my suspicions with how recycling of the key works since Utf8
> is mutable and its backing byte[] can end up shared.
> 
> 
> 
>> 
>> Thanks a lot,
>> Vyacheslav
>> 
>>> 
>>> Thanks!
>>> 
>>> -Scott
>>> 
>>> 
>>> 
>>> On 8/16/11 8:39 AM, "Vyacheslav Zholudev"
>>> <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm having multiple hadoop jobs that use the avro mapred API.
>>>> Only in one of the jobs I have a visible mismatch between a number of
>>>> map
>>>> output records and reducer input records.
>>>> 
>>>> Does anybody encountered such a behavior? Can anybody think of possible
>>>> explanations of this phenomenon?
>>>> 
>>>> Any pointers/thoughts are highly appreciated!
>>>> 
>>>> Best,
>>>> Vyacheslav
>>> 
>>> 
>> 
>> Best,
>> Vyacheslav
>> 
>> 
>> 
> 
>

Re: Map output records/reducer input records mismatch

Reply via email to