Hi Zack, Am I correct in understanding the the files are under a structure like x/.deflate/csv_file.csv ?
In that case, I believe everything under the .deflate directory will simply be ignored, as directories whose name start with a period are considered "hidden" files. However, assuming the data under those directories is compressed using a compression codec supported on your cluster (e.g. gz, snappy, etc), there shouldn't be a problem using them as input for the CSV import. In other words, the compression probably isn't an issue, but the directory naming probably is. - Gabriel On Thu, Sep 29, 2016 at 7:14 PM, Riesland, Zack <zack.riesl...@sensus.com> wrote: > For a very long time, we’ve had a workflow that looks like this: > > > > Export data from a compressed, orc hive table to another hive table that is > “external stored as text file”. No compression specified. > > > > Then, we point to the folder “x” behind that new table and use CsvBulkInsert > to get data to Hbase. > > > > Today, I noticed that the data has not been getting into HBase since late > August. > > > > After some clicking around, it looks like this is happening because we have > hive.exec.compress.output set to true, so the data in folder “x” is > compressed in “.deflate” folders. > > > > However, it looks like someone changed this setting to true 4 months ago. > > > > So we should either be missing 4 months of data, or this should work. > > > > Thus my question: does CSV bulk insert work with compressed output like > this? > > > >