Oh my. Thank you so much, Maciek! I added the proposed snippet from the page 
you linked[0] to all pipeline tests and they are all performing as before. 
Perfect.

Interestingly enough, I read about it in the changelog, but seem to have 
ignored it. Notably though, I would expect the following to mean, that this 
only happens with two-phase commit source functions: 'operators using the 
two-phase commit, the tasks would wait for the final checkpoint completed'[1], 
which is not the case with our custom SinkFunctions.

However I am happy, that it solves our problem. I hope it also helps Alexey.

Thank you again and all the best
  David


[0]: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#checkpointing-with-parts-of-the-graph-finished

    Configuration config = new Configuration();
    
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
false);
    StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);

[1]: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit

> On 9. Sep 2022, at 13:33, Maciek Próchniak <m...@touk.pl> wrote:
> 
> Hi,
> 
> we also had similar problems in Nussknacker recently (tests on fake sources), 
> my colleague found out it's due to ENABLE_CHECKPOINTS_AFTER_FINISH flag 
> (https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/#waiting-for-the-final-checkpoint-before-task-exit)
>  is set by default to true in 1.15. After the fake source ends, the job waits 
> for next checkpoint to be triggered before finishing. In the end we reduced 
> checkpoint interval in some places and disabled checkpoints altogether in 
> some other tests
> 
> maciek
> 
> On 09.09.2022 10:44, David Jost wrote:
>> Hey, sorry for not coming back to this earlier, but I was hoping to better 
>> isolate the problem for analysis.
>> 
>> Maybe for comparison to Alexey's case: We have four different pipelines at 
>> ours, which are all built similarly. Though we use Kafka in the actual jobs, 
>> the tests are using fake sources and sinks. Only the tests which use the 
>> MiniClusterWithClientResource are affected (we have one badly written test, 
>> which runs the pipeline without MiniCluster and it is not affected).
>> 
>> I analysed it a bit and identified, that the job would hang for about 30s 
>> after the last event has been pushed out the sink. Summing up these 30s for 
>> each (job) test results in the additional time used by all tests in the end.
>> So I would assume, that there is some kind of wind-down timer or so, which 
>> holds up everything.
>> 
>> I hope, this is helpful somehow. I would love to find the source of this 
>> issue. I was hoping to isolate the issue in an MWE, but have been 
>> unsuccessful for now.
>> 
>> NB: I tested the tests with both, MiniClusterWithClientResource (with 
>> adjustments for JUnit 5) and MiniClusterExtension, but there was no 
>> noticeable difference.
>> 
>>> On 7. Sep 2022, at 14:41, Alexey Trenikhun <yen...@msn.com> wrote:
>>> 
>>> The class contains single test method, which runs single job (the job is 
>>> quite complex), then waits for job being running after that waits for data 
>>> being populated in output topic, and this doesn't happen during 5 minutes 
>>> (test timeout). Tried under debugger, set breakpoint in Kafka record 
>>> deserializer it is hit but very slow, roughly 3 records per 5 minute (the 
>>> topic was pre-populated)
>>> 
>>> No table/sql API, only stream API
>>> From: Chesnay Schepler <ches...@apache.org>
>>> Sent: Wednesday, September 7, 2022 5:20 AM
>>> To: Alexey Trenikhun <yen...@msn.com>; David Jost <david.j...@uniberg.com>; 
>>> Matthias Pohl <matthias.p...@aiven.io>
>>> Cc: user@flink.apache.org <user@flink.apache.org>
>>> Subject: Re: Slow Tests in Flink 1.15
>>>  The test that gotten slow; how many test cases does it actually contain / 
>>> how many jobs does it actually run?
>>> Are these tests using the table/sql API?
>>> 
>>> On 07/09/2022 14:15, Alexey Trenikhun wrote:
>>>> We are also observing extreme slow down (5+ minutes vs 15 seconds) in 1 of 
>>>> 2 integration tests . Both tests use Kafka. The slow test uses 
>>>> org.apache.flink.runtime.minicluster.TestingMiniCluster, this test tests 
>>>> complete job, which consumes and produces Kafka messages. Not affected 
>>>> test extends org.apache.flink.test.util.AbstractTestBase which uses 
>>>> MiniClusterWithClientResource, this test is simpler and only produce Kafka 
>>>> messages.
>>>> 
>>>> Thanks,
>>>> Alexey
>>>> From: Matthias Pohl via user <user@flink.apache.org>
>>>> Sent: Tuesday, September 6, 2022 6:36 AM
>>>> To: David Jost <david.j...@uniberg.com>
>>>> Cc: user@flink.apache.org <user@flink.apache.org>
>>>> Subject: Re: Slow Tests in Flink 1.15
>>>>  Hi David,
>>>> I guess, you're referring to [1]. But as Chesnay already pointed out in 
>>>> the previous thread: It would be helpful to get more insights into what 
>>>> exactly your tests are executing (logs, code, ...). That would help 
>>>> identifying the cause.
>>>>> Can you give us a more complete stacktrace so we can see what call in
>>>>> Flink is waiting for something?
>>>>> 
>>>>> Does this happen to all of your tests?
>>>>> Can you provide us with an example that we can try ourselves? If not,
>>>>> can you describe the test structure (e.g., is it using a
>>>>> MiniClusterResource).
>>>> Matthias
>>>> 
>>>> [1] https://lists.apache.org/thread/yhhprwyf29kgypzzqdmjgft4qs25yyhk
>>>> 
>>>> On Mon, Sep 5, 2022 at 4:59 PM David Jost <david.j...@uniberg.com> wrote:
>>>> Hi,
>>>> 
>>>> we were going to upgrade our application from Flink 1.14.4 to Flink 
>>>> 1.15.2, when we noticed, that all our job tests, using a 
>>>> MiniClusterWithClientResource, are multiple times slower in 1.15 than 
>>>> before in 1.14. I, unfortunately, have not found mentions in that regard 
>>>> in the changelog or documentation. The slowdown is rather extreme I hope 
>>>> to find a solution to this. I saw it mentioned once in the mailing list, 
>>>> but there was no (public) outcome to it.
>>>> 
>>>> I would appreciate any help on this. Thank you in advance.
>>>> 
>>>> Best
>>>>  David

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to