The header is expected to have the full name of the key class and value class so if it is only detected with the first record (?) indeed the file can not respect its own format.
I haven't tried it but LazyOutputFormat should solve your problem. https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/mapred/lib/LazyOutputFormat.html Regards Bertrand Dechoux Bertrand Dechoux On Tue, Jul 22, 2014 at 10:39 PM, Edward Capriolo <[email protected]> wrote: > I have two processes. One that writes sequence files directly to hdfs, the > other that is a hive table that reads these files. > > All works well with the exception that I am only flushing the files > periodically. SequenceFile input format gets angry when it encounters > 0-bytes seq files. > > I was considering flush and sync on first record write. Also was thinking > should just be able to hack sequence file input format to skip 0 byte files > and not throw exception on readFully() which it sometimes does. > > Anyone ever tackled this? >
