> We are pushing the compressed text files into HDFS directory for Hive >EXTERNAL table, then using an INSERT on the table using ORC storage. We >are letting Hive handle the ORC file creation process.
Are the compressed text files small enough to process one by one? I did write something similar last year for an EBCIDIC case. The only thing it can't do is split a file half-way through, so each file is processed as a single stream with a simple state machine. Cheers, Gopal