Hi, Andreas

I think the following things may be what you want.


1. For writing Avro, I think you can extend AvroOutputFormat and override the  
getDirectoryFileName() method to customize a file name, as shown below.
The javadoc of AvroOutputFormat: 
https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/formats/avro/AvroOutputFormat.html


        public static class CustomAvroOutputFormat extends AvroOutputFormat {
                public CustomAvroOutputFormat(Path filePath, Class type) {
                        super(filePath, type);
                }

                public CustomAvroOutputFormat(Class type) {
                        super(type);
                }

                @Override
                public void open(int taskNumber, int numTasks) throws 
IOException {
                        this.setOutputDirectoryMode(OutputDirectoryMode.ALWAYS);
                        super.open(taskNumber, numTasks);
                }

                @Override
                protected String getDirectoryFileName(int taskNumber) {
                        // returns a custom filename
                        return null;
                }
        }


2. For writing Parquet, you can refer to ParquetStreamingFileSinkITCase, 
StreamingFileSink#forBulkFormat and DateTimeBucketAssigner. You can create a 
class that implements the BucketAssigner interface and return a custom file 
name in the getBucketId() method (the value returned by getBucketId() will be 
treated as the file name).


ParquetStreamingFileSinkITCase:  
https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/ParquetStreamingFileSinkITCase.java


StreamingFileSink#forBulkFormat: 
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.java


DateTimeBucketAssigner: 
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.java




Best,
Haibo

At 2019-07-02 04:15:07, "Hailu, Andreas" <andreas.ha...@gs.com> wrote:


Hello Flink team,

 

I’m writing Avro and Parquet files to HDFS, and I’ve would like to include a 
UUID as a part of the file name.

 

Our files in HDFS currently follow this pattern:

 

tmp-r-00001.snappy.parquet

tmp-r-00002.snappy.parquet

...

 

I’m using a custom output format which extends a RichOutputFormat - is this 
something which is natively supported? If so, could you please recommend how 
this could be done, or share the relevant document?

 

Best,

Andreas




Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices

Reply via email to