Re: Bulk loader mapper output (3.1.0)

Krishna Wed, 17 Sep 2014 09:54:06 -0700

Thanks for clarifying Gabriel.

On Tue, Sep 16, 2014 at 11:45 PM, Gabriel Reid <gabriel.r...@gmail.com>
wrote:


> Hi Krishna,
>
> > Does the bulk loader compress mapper output? I couldn't find anywhere in
> the
> > code where "mapreduce.map.output.compress" is set to true.
>
> The bulk loader doesn't specifically specify compression on the map
> output, but if the client hadoop configuration (i.e. the
> mapred-site.xml on the machine where the job is kicked off) or the
> mapred-site.xml configs on the cluster specify it, then it will be
> used (as with all other mapreduce jobs).
>
> The reason for not specifying it directly in the code itself is that
> that makes a hard dependency on the compression codec(s) available on
> the mapreduce cluster. I suppose these days Snappy an gzip are both
> generally available pretty much all the time, but I've been bitten by
> this in the past where a given codec wasn't available on a system but
> it was specifically referenced from within code.
>
> Another option is to supply the compression settings as part of the
> job arguments via -D parameters, i.e.
> -Dmapreduce.map.output.compress=true.
>
> >
> > Are HFiles compressed only if the Phoenix table (that data is being
> imported
> > to) is created with compression parameter (ex: COMPRESSION='GZ')?
> >
>
> Yes, I believe this is indeed the case. The default behavior of
> HFileOutputFormat (as far as I know) is to take compression settings
> from the output table and apply them to the created HFiles.
>
> - Gabriel
>

Re: Bulk loader mapper output (3.1.0)

Reply via email to