Thank you Jerry.
On Thu, Mar 16, 2017 at 5:28 PM, Jerry He <[email protected]> wrote: > I think you are right. FileOutputFormat has a default hard-coded > FileOutputCommitter. > > If you want to use DirectoOutputcommitter, check the third-party patched > hadoop package that provides this class on how to set this > DirectoOutputcommitter. > > Or you can extends HFileOutputFormat2 and provides a getOutputCommitter() > implementation that returns DirectoOutputcommitter. > > Jerry > > > On Thu, Mar 16, 2017 at 9:29 AM, Fran O <[email protected]> wrote: > > > Hi folks, > > > > I would like to hear some thoughts on the following use case: > > > > I use a custom MR job to create HFiles . This MR writes the HFiles into > S3. > > > > I was trying to change the Outputcommitter in order to have the reducers > > writing directly the HFiles to the final destination on S3. After some > > tests setting the Outputcommitter to be the DirectoOutputcommitter, the > > tasks are always using the FileOutputCommitter. > > > > >> HFileOutputFormat2.configureIncrementalLoad(job, hTable); > > >> FileOutputFormat.setOutputPath(job, outputPath); > > >> FileOutputFormat.setCompressOutput(job, true); > > >> FileOutputFormat.setOutputCompressorClass(job, >>SnappyCodec.class); > > > > Looking at the code of the FileOutputFormat methods > > <https://hadoop.apache.org/docs/stable/api/org/apache/ > > hadoop/mapreduce/lib/output/FileOutputFormat.html> > > I see a *getOutputCommitter > > <https://hadoop.apache.org/docs/stable/api/org/apache/ > > hadoop/mapreduce/lib/output/FileOutputFormat.html# > > getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext)> > > *method > > but not a set method for the OutputCommitter. > > > > Could someone bring some light on how to change the OutputCommitter for > the > > tasks? > > > > Thank you, > > Fran > > >
