Hi Wes, Flink's own OutputFormats don't support compression, but we have some tools to use Hadoop's OutputFormats with Flink [1], and those support compression: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html
Let me know if you need more information. Regards, Robert [1]: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr <wesley.n.k...@gmail.com> wrote: > Hello - > > Forgive me if this has been asked before, but I'm trying to determine the > best way to add compression to DataSink Outputs (starting with > TextOutputFormat). Realistically I would like each partition file (based > on parallelism) to be compressed independently with gzip, but am open to > other solutions. > > My first thought was to extend TextOutputFormat with a new class that > compresses after closing and before returning, but I'm not sure that would > work across all possible file systems (S3, Local, and HDFS). > > Any thoughts? > > Thanks! > > Wes > > >