Hi, 
I'm using a map-reduce job with HFileOutputFormat followed by bulk loads/merges 
to create and populate a table with multiple column familes.  I would like to 
understand how compression works, and how to specify a non-default compression 
in this setup.  So: 

AFAIK, there are two relevant switches: the per-column-family compression 
configuration and hfile.compression.  Are there any others? 
Can the compression format be deduced from the contents of a HFile, or does the 
format of a region store file have to match the family's configuration? 
Can a column family's compression format be changed if it already contains some 
data?  If so, how is this done?  Are the family store files converted to the 
new format before the table comes back online, or is it a lazy-update, or just 
a compaction-time thing? 
Is it possible to write updates for multiple families with different 
compression formats in the same map-reduce job? 
Can HFileOutputFormat:configureIncrementalLoad infer compression format from an 
existing table, just as it does for partitioning? 
Is there a way to specify a default compression which is not None, so that new 
tables and families are automatically compressed (with gzip for example)? 
I have seen archived discussions which refer to RECORD vs BLOCK compression, 
but I don't see those options in later versions.  Have they gone away? 

Thanks, 
--Adam
                                          

Reply via email to