yxu-valleytider opened a new pull request #10653: [FLINK-13027][streaming]: 
StreamingFileSink bulk-encoded writer supports customized checkpoint policy
URL: https://github.com/apache/flink/pull/10653
 
 
   ## What is the purpose of the change
   
   This PR allows bulk-encoded `StreamingFileSink` to instantiate from a 
generic family of rolling policy which rolls files at the checkpoint time.  It 
achieves so by defining a base *CheckpointRollingPolicy*, which is extended by 
the existing `OnCheckpointRollingPolicy` and a new rolling policy 
`FSizeCheckpointRollingPolicy`. The latter policy rolls file not only at the 
checkpoint time, but also possibly before file size reaches a certain limit, 
which is useful for preventing file sizes from growing too big.   Recurrent 
builder pattern described in 
[[1](https://community.oracle.com/blogs/emcmanus/2010/10/24/using-builder-pattern-subclasses)]
 and 
[[2](https://stackoverflow.com/questions/17164375/subclassing-a-java-builder-class)]
 are used to instantiate the rolling policies whenever appropriate, making 
individual rolling policy also extensible.
   
   ## Brief change log
   
   **CheckpointRollingPolicy**
     - An abstract class implementing the base rolling policy which rolls file 
at every checkpoint.
   
   **FSizeCheckpointRollingPolicy**
     - A new rolling policy implementation which rolls part file both when size 
exceeds a limit, *in addition to* during a checkpoint event.
   
   **StreamingFileSink**
     - Bulk-encoded sink writer (*forBulkFormat()*) takes a generic 
`CheckpointRollingPolicy` during instantiation. `OnCheckpointRollingPolicy` is 
still the default, but won't be the only option.
   
   ## Verifying this change
   
   This change is an interface change and already covered by existing tests, 
such as *LocalStreamingFileSinkTest and BulkWriterTest*.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (yes) minor interface change to 
`StreamingFileSink`.
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to