No, what I mean is that your RecordReader should be able to handle a case where it may start from middle of a record and hence not be able to read any record (i.e. return false or whatever right up front).
On Wed, Aug 29, 2012 at 1:27 PM, Chen He <[email protected]> wrote: > Hi Harsh > > Thank you for your reply. Do you mean I need to change the FileSplit to > avoid those errors I mentioned happen? > > Regards! > > Chen > > On Wed, Aug 29, 2012 at 2:46 AM, Harsh J <[email protected]> wrote: >> >> Hi Chen, >> >> Does your record reader and mapper handle the case where one map split >> may not exactly get the whole record? Your case is not very different >> from the newlines logic presented here: >> http://wiki.apache.org/hadoop/HadoopMapReduce >> >> On Wed, Aug 29, 2012 at 11:13 AM, Chen He <[email protected]> wrote: >> > Hi guys >> > >> > I met a interesting problem when I implement my own custom InputFormat >> > which >> > extends the FileInputFormat.(I rewrite the RecordReader class but not >> > the >> > InputSplit class) >> > >> > My recordreader will take following format as a basic record: (my >> > recordreader extends the LineRecordReader. It returns a record if it >> > meets >> > #Trailer# and contains #Header#. I only have one input file that is >> > composed >> > of many of following basic record) >> > >> > #Header# >> > .....(many lines, may be 0 lines or 1000 lines, it varies) >> > #Trailer# >> > >> > Everything works fine if above basic input unit in a file is integer >> > times >> > of mapper. For example, I use 2 mappers and there are two basic records >> > in >> > my input file. Or I use 3 mappers and there are 6 basic units in the >> > input >> > file. >> > >> > However, if I use 4 mappers but there are 3 basic units in the input >> > file(not integer times). The final output is incorrect. The "Map Input >> > Bytes" in the job counter is also less than the input file size. How can >> > I >> > fix it? Do I need to rewrite the inputSplit? >> > >> > Any reply will be appreciated! >> > >> > Regards! >> > >> > Chen >> >> >> >> -- >> Harsh J > > -- Harsh J
