Thank you Jerry.

On Thu, Mar 16, 2017 at 5:28 PM, Jerry He <[email protected]> wrote:

> I think you are right.  FileOutputFormat has a default hard-coded
> FileOutputCommitter.
>
> If you want to use DirectoOutputcommitter, check the third-party patched
> hadoop package that provides this class on how to set this
> DirectoOutputcommitter.
>
> Or you can extends HFileOutputFormat2 and provides a getOutputCommitter()
> implementation that returns DirectoOutputcommitter.
>
> Jerry
>
>
> On Thu, Mar 16, 2017 at 9:29 AM, Fran O <[email protected]> wrote:
>
> > Hi folks,
> >
> > I would like to hear some thoughts on the following use case:
> >
> > I use a custom MR job to create HFiles . This MR writes the HFiles into
> S3.
> >
> > I was trying to change the Outputcommitter in order to have the reducers
> > writing directly the HFiles to the final destination on S3. After some
> > tests setting the Outputcommitter to be the DirectoOutputcommitter, the
> > tasks are always using the FileOutputCommitter.
> >
> > >> HFileOutputFormat2.configureIncrementalLoad(job, hTable);
> > >> FileOutputFormat.setOutputPath(job, outputPath);
> > >> FileOutputFormat.setCompressOutput(job, true);
> > >> FileOutputFormat.setOutputCompressorClass(job, >>SnappyCodec.class);
> >
> > Looking at the code of the FileOutputFormat methods
> > <https://hadoop.apache.org/docs/stable/api/org/apache/
> > hadoop/mapreduce/lib/output/FileOutputFormat.html>
> > I see a *getOutputCommitter
> > <https://hadoop.apache.org/docs/stable/api/org/apache/
> > hadoop/mapreduce/lib/output/FileOutputFormat.html#
> > getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext)>
> > *method
> > but not a set method for the OutputCommitter.
> >
> > Could someone bring some light on how to change the OutputCommitter for
> the
> > tasks?
> >
> > Thank you,
> > Fran
> >
>

Reply via email to