I have two processes. One that writes sequence files directly to hdfs, the other that is a hive table that reads these files.
All works well with the exception that I am only flushing the files periodically. SequenceFile input format gets angry when it encounters 0-bytes seq files. I was considering flush and sync on first record write. Also was thinking should just be able to hack sequence file input format to skip 0 byte files and not throw exception on readFully() which it sometimes does. Anyone ever tackled this?
