I looked at the source by curiosity, for the latest version (2.4), the header is flushed during the writer creation. Of course, key/value classes are provided. By 0-bytes, you really mean even without the header? Or 0 bytes of payload?
On Tue, Jul 22, 2014 at 11:05 PM, Bertrand Dechoux <[email protected]> wrote: > The header is expected to have the full name of the key class and value > class so if it is only detected with the first record (?) indeed the file > can not respect its own format. > > I haven't tried it but LazyOutputFormat should solve your problem. > > https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/mapred/lib/LazyOutputFormat.html > > Regards > > Bertrand Dechoux > > > Bertrand Dechoux > > > On Tue, Jul 22, 2014 at 10:39 PM, Edward Capriolo <[email protected]> > wrote: > >> I have two processes. One that writes sequence files directly to hdfs, >> the other that is a hive table that reads these files. >> >> All works well with the exception that I am only flushing the files >> periodically. SequenceFile input format gets angry when it encounters >> 0-bytes seq files. >> >> I was considering flush and sync on first record write. Also was thinking >> should just be able to hack sequence file input format to skip 0 byte files >> and not throw exception on readFully() which it sometimes does. >> >> Anyone ever tackled this? >> > >
