Re: Bulk loader mapper output (3.1.0)

Gabriel Reid Tue, 16 Sep 2014 23:45:54 -0700

Hi Krishna,

> Does the bulk loader compress mapper output? I couldn't find anywhere in the
> code where "mapreduce.map.output.compress" is set to true.


The bulk loader doesn't specifically specify compression on the map
output, but if the client hadoop configuration (i.e. the
mapred-site.xml on the machine where the job is kicked off) or the
mapred-site.xml configs on the cluster specify it, then it will be
used (as with all other mapreduce jobs).

The reason for not specifying it directly in the code itself is that
that makes a hard dependency on the compression codec(s) available on
the mapreduce cluster. I suppose these days Snappy an gzip are both
generally available pretty much all the time, but I've been bitten by
this in the past where a given codec wasn't available on a system but
it was specifically referenced from within code.

Another option is to supply the compression settings as part of the
job arguments via -D parameters, i.e.
-Dmapreduce.map.output.compress=true.

>
> Are HFiles compressed only if the Phoenix table (that data is being imported
> to) is created with compression parameter (ex: COMPRESSION='GZ')?
>

Yes, I believe this is indeed the case. The default behavior of
HFileOutputFormat (as far as I know) is to take compression settings
from the output table and apply them to the created HFiles.

- Gabriel

Re: Bulk loader mapper output (3.1.0)

Reply via email to