xzw0223 created FLINK-31192: ------------------------------- Summary: dataGen takes too long to initialize under sequence Key: FLINK-31192 URL: https://issues.apache.org/jira/browse/FLINK-31192 Project: Flink Issue Type: Improvement Affects Versions: 1.16.1, 1.16.0 Reporter: xzw0223 Fix For: 1.16.1, 1.16.0
The SequenceGenerator preloads all sequence values in open. If the totalElement number is too large, it will take too long. [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/datagen/SequenceGenerator.java#L91] The reason is that the capacity of the Deque will be expanded twice when the current capacity is full, and the array copy is required, which is time-consuming. Here's what I think : do not preload the full amount of data on Sequence, and generate a piece of data each time next is called to solve the problem of slow initialization caused by loading full amount of data. record the currently sent Sequence position through the checkpoint, and continue to send data through the recorded position after an abnormal restart to ensure fault tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)