Oh my. Thank you so much, Maciek! I added the proposed snippet from the page you linked[0] to all pipeline tests and they are all performing as before. Perfect.
Interestingly enough, I read about it in the changelog, but seem to have ignored it. Notably though, I would expect the following to mean, that this only happens with two-phase commit source functions: 'operators using the two-phase commit, the tasks would wait for the final checkpoint completed'[1], which is not the case with our custom SinkFunctions. However I am happy, that it solves our problem. I hope it also helps Alexey. Thank you again and all the best David [0]: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#checkpointing-with-parts-of-the-graph-finished Configuration config = new Configuration(); config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, false); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(config); [1]: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit > On 9. Sep 2022, at 13:33, Maciek Próchniak <m...@touk.pl> wrote: > > Hi, > > we also had similar problems in Nussknacker recently (tests on fake sources), > my colleague found out it's due to ENABLE_CHECKPOINTS_AFTER_FINISH flag > (https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit) > is set by default to true in 1.15. After the fake source ends, the job waits > for next checkpoint to be triggered before finishing. In the end we reduced > checkpoint interval in some places and disabled checkpoints altogether in > some other tests > > maciek > > On 09.09.2022 10:44, David Jost wrote: >> Hey, sorry for not coming back to this earlier, but I was hoping to better >> isolate the problem for analysis. >> >> Maybe for comparison to Alexey's case: We have four different pipelines at >> ours, which are all built similarly. Though we use Kafka in the actual jobs, >> the tests are using fake sources and sinks. Only the tests which use the >> MiniClusterWithClientResource are affected (we have one badly written test, >> which runs the pipeline without MiniCluster and it is not affected). >> >> I analysed it a bit and identified, that the job would hang for about 30s >> after the last event has been pushed out the sink. Summing up these 30s for >> each (job) test results in the additional time used by all tests in the end. >> So I would assume, that there is some kind of wind-down timer or so, which >> holds up everything. >> >> I hope, this is helpful somehow. I would love to find the source of this >> issue. I was hoping to isolate the issue in an MWE, but have been >> unsuccessful for now. >> >> NB: I tested the tests with both, MiniClusterWithClientResource (with >> adjustments for JUnit 5) and MiniClusterExtension, but there was no >> noticeable difference. >> >>> On 7. Sep 2022, at 14:41, Alexey Trenikhun <yen...@msn.com> wrote: >>> >>> The class contains single test method, which runs single job (the job is >>> quite complex), then waits for job being running after that waits for data >>> being populated in output topic, and this doesn't happen during 5 minutes >>> (test timeout). Tried under debugger, set breakpoint in Kafka record >>> deserializer it is hit but very slow, roughly 3 records per 5 minute (the >>> topic was pre-populated) >>> >>> No table/sql API, only stream API >>> From: Chesnay Schepler <ches...@apache.org> >>> Sent: Wednesday, September 7, 2022 5:20 AM >>> To: Alexey Trenikhun <yen...@msn.com>; David Jost <david.j...@uniberg.com>; >>> Matthias Pohl <matthias.p...@aiven.io> >>> Cc: user@flink.apache.org <user@flink.apache.org> >>> Subject: Re: Slow Tests in Flink 1.15 >>> The test that gotten slow; how many test cases does it actually contain / >>> how many jobs does it actually run? >>> Are these tests using the table/sql API? >>> >>> On 07/09/2022 14:15, Alexey Trenikhun wrote: >>>> We are also observing extreme slow down (5+ minutes vs 15 seconds) in 1 of >>>> 2 integration tests . Both tests use Kafka. The slow test uses >>>> org.apache.flink.runtime.minicluster.TestingMiniCluster, this test tests >>>> complete job, which consumes and produces Kafka messages. Not affected >>>> test extends org.apache.flink.test.util.AbstractTestBase which uses >>>> MiniClusterWithClientResource, this test is simpler and only produce Kafka >>>> messages. >>>> >>>> Thanks, >>>> Alexey >>>> From: Matthias Pohl via user <user@flink.apache.org> >>>> Sent: Tuesday, September 6, 2022 6:36 AM >>>> To: David Jost <david.j...@uniberg.com> >>>> Cc: user@flink.apache.org <user@flink.apache.org> >>>> Subject: Re: Slow Tests in Flink 1.15 >>>> Hi David, >>>> I guess, you're referring to [1]. But as Chesnay already pointed out in >>>> the previous thread: It would be helpful to get more insights into what >>>> exactly your tests are executing (logs, code, ...). That would help >>>> identifying the cause. >>>>> Can you give us a more complete stacktrace so we can see what call in >>>>> Flink is waiting for something? >>>>> >>>>> Does this happen to all of your tests? >>>>> Can you provide us with an example that we can try ourselves? If not, >>>>> can you describe the test structure (e.g., is it using a >>>>> MiniClusterResource). >>>> Matthias >>>> >>>> [1] https://lists.apache.org/thread/yhhprwyf29kgypzzqdmjgft4qs25yyhk >>>> >>>> On Mon, Sep 5, 2022 at 4:59 PM David Jost <david.j...@uniberg.com> wrote: >>>> Hi, >>>> >>>> we were going to upgrade our application from Flink 1.14.4 to Flink >>>> 1.15.2, when we noticed, that all our job tests, using a >>>> MiniClusterWithClientResource, are multiple times slower in 1.15 than >>>> before in 1.14. I, unfortunately, have not found mentions in that regard >>>> in the changelog or documentation. The slowdown is rather extreme I hope >>>> to find a solution to this. I saw it mentioned once in the mailing list, >>>> but there was no (public) outcome to it. >>>> >>>> I would appreciate any help on this. Thank you in advance. >>>> >>>> Best >>>> David
smime.p7s
Description: S/MIME cryptographic signature