Hi Krishna, > Does the bulk loader compress mapper output? I couldn't find anywhere in the > code where "mapreduce.map.output.compress" is set to true.
The bulk loader doesn't specifically specify compression on the map output, but if the client hadoop configuration (i.e. the mapred-site.xml on the machine where the job is kicked off) or the mapred-site.xml configs on the cluster specify it, then it will be used (as with all other mapreduce jobs). The reason for not specifying it directly in the code itself is that that makes a hard dependency on the compression codec(s) available on the mapreduce cluster. I suppose these days Snappy an gzip are both generally available pretty much all the time, but I've been bitten by this in the past where a given codec wasn't available on a system but it was specifically referenced from within code. Another option is to supply the compression settings as part of the job arguments via -D parameters, i.e. -Dmapreduce.map.output.compress=true. > > Are HFiles compressed only if the Phoenix table (that data is being imported > to) is created with compression parameter (ex: COMPRESSION='GZ')? > Yes, I believe this is indeed the case. The default behavior of HFileOutputFormat (as far as I know) is to take compression settings from the output table and apply them to the created HFiles. - Gabriel