Thanks for clarifying Gabriel. On Tue, Sep 16, 2014 at 11:45 PM, Gabriel Reid <gabriel.r...@gmail.com> wrote:
> Hi Krishna, > > > Does the bulk loader compress mapper output? I couldn't find anywhere in > the > > code where "mapreduce.map.output.compress" is set to true. > > The bulk loader doesn't specifically specify compression on the map > output, but if the client hadoop configuration (i.e. the > mapred-site.xml on the machine where the job is kicked off) or the > mapred-site.xml configs on the cluster specify it, then it will be > used (as with all other mapreduce jobs). > > The reason for not specifying it directly in the code itself is that > that makes a hard dependency on the compression codec(s) available on > the mapreduce cluster. I suppose these days Snappy an gzip are both > generally available pretty much all the time, but I've been bitten by > this in the past where a given codec wasn't available on a system but > it was specifically referenced from within code. > > Another option is to supply the compression settings as part of the > job arguments via -D parameters, i.e. > -Dmapreduce.map.output.compress=true. > > > > > Are HFiles compressed only if the Phoenix table (that data is being > imported > > to) is created with compression parameter (ex: COMPRESSION='GZ')? > > > > Yes, I believe this is indeed the case. The default behavior of > HFileOutputFormat (as far as I know) is to take compression settings > from the output table and apply them to the created HFiles. > > - Gabriel >