Re: Compress DataSink Output

Robert Metzger Fri, 19 Aug 2016 06:15:45 -0700

Hi Wes,

Flink's own OutputFormats don't support compression, but we have some tools
to use Hadoop's OutputFormats with Flink [1], and those support
compression:
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html


Let me know if you need more information.

Regards,
Robert

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html


On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr <wesley.n.k...@gmail.com>
wrote:

> Hello -
>
> Forgive me if this has been asked before, but I'm trying to determine the
> best way to add compression to DataSink Outputs (starting with
> TextOutputFormat).  Realistically I would like each partition file (based
> on parallelism) to be compressed independently with gzip, but am open to
> other solutions.
>
> My first thought was to extend TextOutputFormat with a new class that
> compresses after closing and before returning, but I'm not sure that would
> work across all possible file systems (S3, Local, and HDFS).
>
> Any thoughts?
>
> Thanks!
>
> Wes
>
>
>

Re: Compress DataSink Output

Reply via email to