Hi there Hive Groupers,

I've got a question regarding hive Architecture in regards to compression,
or more how Hive treats compressed tables when it reads from them.

Use Case:
1. 2 Compressed tables in HDFS, 1TB  each.
2. One table is compressed with a splittable compression while the other
isn't.
3. Mapreduce program reads each table and write a new text only table
(uncompressed around 4TB).

What happens when mapreduce access the compressed tables:
*  is data compressed on hdfs or local nodes temp storage
* is compressed data being saved to disk or piped to the map reduce job
* in the shuffle phase, are we saving the uncompressed data on the local
nodes?

Thank you so much!

Reply via email to