Hi Scott, The pair types are Pair<CharSequence, SomeSpecificJavaClass>, but in essence when I call "collect()" then I always provide a java.lang.String object.
The reduce method is reduce(CharSequence key, Iterable<SomeSpecificJavaClass> values, .....) Some more detailed info: the jobtracker and namenode run with: java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) the tasktrackers and datanodes run with: java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) Hadoop version is: cdh3u1 Thanks for suggestions, Vyacheslav On Aug 17, 2011, at 3:56 AM, Scott Carey wrote: > On 8/16/11 3:56 PM, "Vyacheslav Zholudev" <[email protected]> > wrote: > >> Hi, Scott, >> >> thanks for your reply. >> >>> What Avro version is this happening with? What JVM version? >> >> We are using Avro 1.5.1 and Sun JDK 6, but the exact version I will have >> to look up. >> >>> >>> On a hunch, have you tried adding -XX:-UseLoopPredicate to the JVM args >>> if >>> it is Sun and JRE 6u21 or later? (some issues in loop predicates affect >>> Java 6 too, just not as many as the recent news on Java7). >>> >>> Otherwise, it may likely be the same thing as AVRO-782. Any extra >>> information related to that issue would be welcome. >> >> I will have to collect it. In the meanwhile, do you have any reasonable >> explanations of the issue besides it being something like AVRO-782? > > What is your key type (map output schema, first type argument of Pair)? > Is your key a Utf8 or String? I don't have a reasonable explanation at > this point, I haven't looked into it in depth with a good reproducible > case. I have my suspicions with how recycling of the key works since Utf8 > is mutable and its backing byte[] can end up shared. > > > >> >> Thanks a lot, >> Vyacheslav >> >>> >>> Thanks! >>> >>> -Scott >>> >>> >>> >>> On 8/16/11 8:39 AM, "Vyacheslav Zholudev" >>> <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I'm having multiple hadoop jobs that use the avro mapred API. >>>> Only in one of the jobs I have a visible mismatch between a number of >>>> map >>>> output records and reducer input records. >>>> >>>> Does anybody encountered such a behavior? Can anybody think of possible >>>> explanations of this phenomenon? >>>> >>>> Any pointers/thoughts are highly appreciated! >>>> >>>> Best, >>>> Vyacheslav >>> >>> >> >> Best, >> Vyacheslav >> >> >> > >
