Re: Does HadoopOutputFormat create MapReduce job internally?

Morven Huang Wed, 10 Apr 2019 19:35:01 -0700

Hi Fabian,

Thank you for the clarification.


Best,
Morven Huang

On Wed, Apr 10, 2019 at 9:57 PM Fabian Hueske <fhue...@gmail.com> wrote:

> Hi,
>
> Flink's Hadoop compatibility functions just wrap functions that were
> implemented against Hadoop's interfaces in wrapper functions that are
> implemented against Flink's interfaces.
> There is no Hadoop cluster started or MapReduce job being executed.
>
> Job is just a class of the Hadoop API. It does not imply that a Hadoop job
> is executed.
>
> Best,
> Fabian
>
>
>
> Am Mi., 10. Apr. 2019 um 15:12 Uhr schrieb Morven Huang <
> morven.hu...@gmail.com>:
>
>> Hi,
>>
>>
>>
>> I’d like to sink my data into hdfs using
>> SequenceFileAsBinaryOutputFormat with compression, and I find a way from
>> the link
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html,
>> the code works, but I’m curious to know, since it creates a mapreduce
>> Job instance here, would this Flink application creates and run a mapreduce
>> underneath? If so, will it kill performance?
>>
>>
>>
>> I tried to figure out by looking into log, but couldn’t get a clue, hope
>> people could shed some light here. Thank you.
>>
>>
>>
>> Job job = Job.getInstance();
>>
>> HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new
>> HadoopOutputFormat<BytesWritable, BytesWritable>(
>>
>>                     new SequenceFileAsBinaryOutputFormat(), job);
>>
>>
>>
>> hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress",
>> "true");
>>
>> hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type",
>> CompressionType.BLOCK.toString());
>>
>> TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));
>>
>> dataset.output(hadoopOF);
>>
>

Re: Does HadoopOutputFormat create MapReduce job internally?

Reply via email to