Hi there Hive Groupers, I've got a question regarding hive Architecture in regards to compression, or more how Hive treats compressed tables when it reads from them.
Use Case: 1. 2 Compressed tables in HDFS, 1TB each. 2. One table is compressed with a splittable compression while the other isn't. 3. Mapreduce program reads each table and write a new text only table (uncompressed around 4TB). What happens when mapreduce access the compressed tables: * is data compressed on hdfs or local nodes temp storage * is compressed data being saved to disk or piped to the map reduce job * in the shuffle phase, are we saving the uncompressed data on the local nodes? Thank you so much!
