Hi, 

I’d like to sink my data into hdfs using SequenceFileAsBinaryOutputFormat with 
compression, and I find a way from the link 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html,
 the code works, but I’m curious to know, since it creates a mapreduce Job 
instance here, would this Flink application creates and run a mapreduce 
underneath? If so, will it kill performance?

I tried to figure out by looking into log, but couldn’t get a clue, hope people 
could shed some light here. Thank you.

Job job = Job.getInstance();
HadoopOutputFormat<BytesWritable, BytesWritable> hadoopOF = new 
HadoopOutputFormat<BytesWritable, BytesWritable>(
                new SequenceFileAsBinaryOutputFormat(), job);

hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress", 
"true");
hadoopOF.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type",
 CompressionType.BLOCK.toString());
TextOutputFormat.setOutputPath(job, new Path("hdfs://..."));
dataset.output(hadoopOF);

Sent from Mail for Windows 10

Reply via email to