For a very long time, we've had a workflow that looks like this: Export data from a compressed, orc hive table to another hive table that is "external stored as text file". No compression specified.
Then, we point to the folder "x" behind that new table and use CsvBulkInsert to get data to Hbase. Today, I noticed that the data has not been getting into HBase since late August. After some clicking around, it looks like this is happening because we have hive.exec.compress.output set to true, so the data in folder "x" is compressed in ".deflate" folders. However, it looks like someone changed this setting to true 4 months ago. So we should either be missing 4 months of data, or this should work. Thus my question: does CSV bulk insert work with compressed output like this?