Yes, Avro. A similar bug may exist in Avro's input buffering code. Doug On Dec 23, 2013 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <[email protected]> wrote:
> Hi Doug, > You want me to raise a bug against Avro or Hadoop-Core. My guess is avro > Regards, > Deepak > > > On Tue, Dec 24, 2013 at 12:10 AM, Doug Cutting <[email protected]> wrote: > >> This sounds like a bug. >> >> I wonder if it is similar to a related bug in Hadoop? >> >> https://issues.apache.org/jira/browse/HADOOP-9307 >> >> If so, please file an issue in Jira. >> >> Doug >> >> On Sat, Dec 21, 2013 at 4:35 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]> >> wrote: >> > Hello, >> > I have a 340 MB avro data file that contains records sorted and >> identified >> > by unique id (duplicate records exists). At the beginning of every >> unique >> > record a synchronization point is created with DataFileWriter.sync(). (I >> > cannot or do not want to save the sync points and i do not want to use >> > SortedKeyValueFile as output format for M/R job) >> > >> > There are at-least 25k synchronization points in a 340 MB file. >> > >> > Ex: >> > Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2 >> > >> > >> > As records are sorted, for efficient retrieval, binary search is >> performed >> > using the attached code. >> > >> > Most of the times the search is successful, at times the code throws the >> > following exception >> > ------ >> > org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid >> sync! at >> > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210 >> > ------ >> > >> > >> > >> > Questions >> > 1) Is it ok to have 25k sycn points for 300 MB file ? Does it cost in >> > performance while reading ? >> > 2) I note down the position that was used to invoke >> fileReader.sync(mid);. >> > If i catch AvroRuntimeException, close and open the file and sync(mid) >> i do >> > not see exception. Why should Avro throw exception before and not later >> ? >> > 3) Is there a limit on number of times sync() is invoked ? >> > 4) When sync(position) is invoked, are any 0 >= position <= file.size() >> > valid ? If yes why do i see AvroRuntimeException (#2) ? >> > >> > Regards, >> > Deepak >> > >> > > > > -- > Deepak > >
