[jira] [Commented] (AVRO-1786) Strange IndexOutofBoundException in GenericDatumReader.readString

2017-08-02 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112014#comment-16112014
 ] 

BELUGA BEHR commented on AVRO-1786:
---

[~cutting] [~rdblue] I've been tearing through the code of Avro and MapReduce 
to find a chink in the armor, but not luck.  Any thoughts on where I can focus 
my search?

> Strange IndexOutofBoundException in GenericDatumReader.readString
> -
>
> Key: AVRO-1786
> URL: https://issues.apache.org/jira/browse/AVRO-1786
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.4, 1.7.7
> Environment: CentOS 6.5 Linux x64, 2.6.32-358.14.1.el6.x86_64
> Use IBM JVM:
> IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References 
> 20140515_199835 (JIT enabled, AOT enabled)
>Reporter: Yong Zhang
>Priority: Minor
>
> Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running 
> IBM BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no 
> yarn), and comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE 
> 1.7.0 Linux amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT 
> enabled). Not sure if the JDK matters, but it is NOT Oracle JVM.
> We have a ETL implemented in a chain of MR jobs. In one MR job, it is going 
> to merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is 
> in HDFS location B, and both contains the AVRO records binding to the same 
> AVRO schema. The record contains an unique id field, and a timestamp field. 
> The MR job is to merge the records based on the ID, and use the later 
> timestamp record to replace previous timestamp record, and omit the final 
> AVRO record out. Very straightforward.
> Now we faced a problem that one reducer keeps failing with the following 
> stacktrace on JobTracker:
> {code}
> java.lang.IndexOutOfBoundsException
>   at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191)
>   at java.io.DataInputStream.read(DataInputStream.java:160)
>   at 
> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
>   at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
>   at 
> org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
>   at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
>   at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
>   at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:366)
>   at javax.security.auth.Subject.doAs(Subject.java:572)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> Here is the my Mapper and Reducer methods:
> Mapper:
> public void map(AvroKey key, NullWritable value, Context 
> context) throws IOException, InterruptedException 
> Reducer:
> protected void reduce(CustomPartitionKeyClass key, 
> Iterable> values, Context context) throws 
> IOException, InterruptedException 
> What bother me are the following facts:
> 1) All the mappers finish without error
> 2) Most of the reducers finish without error, but one reducer keeps failing 
> with the above error.
> 3) It looks like caused by the data? But keep in mind that all the avro 
> records passed the mapper side, but failed in one reducer

[jira] [Commented] (AVRO-2059) Remove support of Hadoop 1

2017-08-02 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111828#comment-16111828
 ] 

Doug Cutting commented on AVRO-2059:


Please also update the release instructions.

https://cwiki.apache.org/confluence/display/AVRO/How+To+Release

> Remove support of Hadoop 1
> --
>
> Key: AVRO-2059
> URL: https://issues.apache.org/jira/browse/AVRO-2059
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.9.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
> Fix For: 1.9.0
>
>
> Remove support of Hadoop 1 in the next major release 1.9.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)