Re: Very long launch of the Flink application in BATCH mode

2023-07-03 Thread Brendan Cortez
Thanks guys. I tried the 1.17.1 version, but the problem still remains. It
seems to be a bug, I created an issue
https://issues.apache.org/jira/browse/FLINK-32513.

On Thu, 29 Jun 2023 at 10:57, Martijn Visser 
wrote:

> Hi Vladislav,
>
> I think it might be worthwhile to upgrade to Flink 1.17, given the
> improvements that have been made in Flink 1.16 and 1.17 on batch
> processing. See for example the release notes of 1.17, with an entire
> section on batch processing
> https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/#batch-processing
>
> Best regards,
>
> Martijn
>
> On Wed, Jun 28, 2023 at 7:27 PM Vladislav Keda 
> wrote:
>
>> Hi Shammon,
>>
>> When I set log.level=DEBUG I have no more logs except  *2023-06-21
>> 14:51:30,921 DEBUG
>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
>> Trigger heartbeat request.*
>>
>> Job freezes on stream graph generation. In STREAMING mode the job starts
>> fast without same problems.
>>
>> ср, 28 июн. 2023 г. в 06:44, Shammon FY :
>>
>>> Hi Brendan,
>>>
>>> I think you may need to confirm which stage the job is blocked, the
>>> client is submitting job or resourcemanage is scheduling job or tasks are
>>> launching in TM? May be you need provide more information to help us to
>>> figure the issue
>>>
>>> Best,
>>> Shammon FY
>>>
>>> On Tuesday, June 27, 2023, Weihua Hu  wrote:
>>>
 Hi, Brendan

 It looks like it's invoking your main method referring to the log. You
 can add more logs in the main method to figure out which part takes too
 long.

 Best,
 Weihua


 On Tue, Jun 27, 2023 at 5:06 AM Brendan Cortez <
 brendan.cortez...@gmail.com> wrote:

> No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.
>
> On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
> wrote:
>
>> Hey Brendan,
>>
>> Do you use a file source by any chance?
>>
>> On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
>> brendan.cortez...@gmail.com> wrote:
>>
>>> Hi all!
>>>
>>> I'm trying to submit a Flink Job in Application Mode in the
>>> Kubernetes cluster.
>>>
>>> I see some problems when an application has a big number of
>>> operators (more than 20 same operators) - it freezes for ~6 minutes 
>>> after
>>> *2023-06-21 15:46:45,082 WARN
>>>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - 
>>> Property
>>> [transaction.timeout.ms ] not specified.
>>> Setting it to PT1H*
>>>  and until
>>>
>>> *2023-06-21 15:53:20,002 INFO
>>>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - 
>>> Disabled
>>> Checkpointing. Checkpointing is not supported and not needed when 
>>> executing
>>> jobs in BATCH mode.*(logs in attachment)
>>>
>>> When I set log.level=DEBUG, I see only this message each 10 seconds:
>>> *2023-06-21 14:51:30,921 DEBUG
>>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager 
>>> [] -
>>> Trigger heartbeat request.*
>>>
>>> Please, could you help me understand the cause of this problem and
>>> how to fix it. I use the Flink 1.15.3 version.
>>>
>>> Thank you in advance!
>>>
>>> Best regards,
>>> Brendan Cortez.
>>>
>>


Re: Very long launch of the Flink application in BATCH mode

2023-06-29 Thread Martijn Visser
Hi Vladislav,

I think it might be worthwhile to upgrade to Flink 1.17, given the
improvements that have been made in Flink 1.16 and 1.17 on batch
processing. See for example the release notes of 1.17, with an entire
section on batch processing
https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/#batch-processing

Best regards,

Martijn

On Wed, Jun 28, 2023 at 7:27 PM Vladislav Keda 
wrote:

> Hi Shammon,
>
> When I set log.level=DEBUG I have no more logs except  *2023-06-21
> 14:51:30,921 DEBUG
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
> Trigger heartbeat request.*
>
> Job freezes on stream graph generation. In STREAMING mode the job starts
> fast without same problems.
>
> ср, 28 июн. 2023 г. в 06:44, Shammon FY :
>
>> Hi Brendan,
>>
>> I think you may need to confirm which stage the job is blocked, the
>> client is submitting job or resourcemanage is scheduling job or tasks are
>> launching in TM? May be you need provide more information to help us to
>> figure the issue
>>
>> Best,
>> Shammon FY
>>
>> On Tuesday, June 27, 2023, Weihua Hu  wrote:
>>
>>> Hi, Brendan
>>>
>>> It looks like it's invoking your main method referring to the log. You
>>> can add more logs in the main method to figure out which part takes too
>>> long.
>>>
>>> Best,
>>> Weihua
>>>
>>>
>>> On Tue, Jun 27, 2023 at 5:06 AM Brendan Cortez <
>>> brendan.cortez...@gmail.com> wrote:
>>>
 No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.

 On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
 wrote:

> Hey Brendan,
>
> Do you use a file source by any chance?
>
> On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
> brendan.cortez...@gmail.com> wrote:
>
>> Hi all!
>>
>> I'm trying to submit a Flink Job in Application Mode in the
>> Kubernetes cluster.
>>
>> I see some problems when an application has a big number of operators
>> (more than 20 same operators) - it freezes for ~6 minutes after
>> *2023-06-21 15:46:45,082 WARN
>>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - 
>> Property
>> [transaction.timeout.ms ] not specified.
>> Setting it to PT1H*
>>  and until
>>
>> *2023-06-21 15:53:20,002 INFO
>>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - 
>> Disabled
>> Checkpointing. Checkpointing is not supported and not needed when 
>> executing
>> jobs in BATCH mode.*(logs in attachment)
>>
>> When I set log.level=DEBUG, I see only this message each 10 seconds:
>> *2023-06-21 14:51:30,921 DEBUG
>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] 
>> -
>> Trigger heartbeat request.*
>>
>> Please, could you help me understand the cause of this problem and
>> how to fix it. I use the Flink 1.15.3 version.
>>
>> Thank you in advance!
>>
>> Best regards,
>> Brendan Cortez.
>>
>


Re: Very long launch of the Flink application in BATCH mode

2023-06-28 Thread Vladislav Keda
Hi Shammon,

When I set log.level=DEBUG I have no more logs except  *2023-06-21
14:51:30,921 DEBUG
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
Trigger heartbeat request.*

Job freezes on stream graph generation. In STREAMING mode the job starts
fast without same problems.

ср, 28 июн. 2023 г. в 06:44, Shammon FY :

> Hi Brendan,
>
> I think you may need to confirm which stage the job is blocked, the client
> is submitting job or resourcemanage is scheduling job or tasks are
> launching in TM? May be you need provide more information to help us to
> figure the issue
>
> Best,
> Shammon FY
>
> On Tuesday, June 27, 2023, Weihua Hu  wrote:
>
>> Hi, Brendan
>>
>> It looks like it's invoking your main method referring to the log. You
>> can add more logs in the main method to figure out which part takes too
>> long.
>>
>> Best,
>> Weihua
>>
>>
>> On Tue, Jun 27, 2023 at 5:06 AM Brendan Cortez <
>> brendan.cortez...@gmail.com> wrote:
>>
>>> No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.
>>>
>>> On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
>>> wrote:
>>>
 Hey Brendan,

 Do you use a file source by any chance?

 On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
 brendan.cortez...@gmail.com> wrote:

> Hi all!
>
> I'm trying to submit a Flink Job in Application Mode in the Kubernetes
> cluster.
>
> I see some problems when an application has a big number of operators
> (more than 20 same operators) - it freezes for ~6 minutes after
> *2023-06-21 15:46:45,082 WARN
>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - 
> Property
> [transaction.timeout.ms ] not specified.
> Setting it to PT1H*
>  and until
>
> *2023-06-21 15:53:20,002 INFO
>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - 
> Disabled
> Checkpointing. Checkpointing is not supported and not needed when 
> executing
> jobs in BATCH mode.*(logs in attachment)
>
> When I set log.level=DEBUG, I see only this message each 10 seconds:
> *2023-06-21 14:51:30,921 DEBUG
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
> Trigger heartbeat request.*
>
> Please, could you help me understand the cause of this problem and how
> to fix it. I use the Flink 1.15.3 version.
>
> Thank you in advance!
>
> Best regards,
> Brendan Cortez.
>



Re: Very long launch of the Flink application in BATCH mode

2023-06-27 Thread Shammon FY
Hi Brendan,

I think you may need to confirm which stage the job is blocked, the client
is submitting job or resourcemanage is scheduling job or tasks are
launching in TM? May be you need provide more information to help us to
figure the issue

Best,
Shammon FY

On Tuesday, June 27, 2023, Weihua Hu  wrote:

> Hi, Brendan
>
> It looks like it's invoking your main method referring to the log. You can
> add more logs in the main method to figure out which part takes too long.
>
> Best,
> Weihua
>
>
> On Tue, Jun 27, 2023 at 5:06 AM Brendan Cortez <
> brendan.cortez...@gmail.com> wrote:
>
>> No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.
>>
>> On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
>> wrote:
>>
>>> Hey Brendan,
>>>
>>> Do you use a file source by any chance?
>>>
>>> On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
>>> brendan.cortez...@gmail.com> wrote:
>>>
 Hi all!

 I'm trying to submit a Flink Job in Application Mode in the Kubernetes
 cluster.

 I see some problems when an application has a big number of operators
 (more than 20 same operators) - it freezes for ~6 minutes after
 *2023-06-21 15:46:45,082 WARN
  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
 [transaction.timeout.ms ] not specified.
 Setting it to PT1H*
  and until

 *2023-06-21 15:53:20,002 INFO
  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
 Checkpointing. Checkpointing is not supported and not needed when executing
 jobs in BATCH mode.*(logs in attachment)

 When I set log.level=DEBUG, I see only this message each 10 seconds:
 *2023-06-21 14:51:30,921 DEBUG
 org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
 Trigger heartbeat request.*

 Please, could you help me understand the cause of this problem and how
 to fix it. I use the Flink 1.15.3 version.

 Thank you in advance!

 Best regards,
 Brendan Cortez.

>>>


Re: Very long launch of the Flink application in BATCH mode

2023-06-27 Thread Weihua Hu
Hi, Brendan

It looks like it's invoking your main method referring to the log. You can
add more logs in the main method to figure out which part takes too long.

Best,
Weihua


On Tue, Jun 27, 2023 at 5:06 AM Brendan Cortez 
wrote:

> No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.
>
> On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
> wrote:
>
>> Hey Brendan,
>>
>> Do you use a file source by any chance?
>>
>> On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
>> brendan.cortez...@gmail.com> wrote:
>>
>>> Hi all!
>>>
>>> I'm trying to submit a Flink Job in Application Mode in the Kubernetes
>>> cluster.
>>>
>>> I see some problems when an application has a big number of operators
>>> (more than 20 same operators) - it freezes for ~6 minutes after
>>> *2023-06-21 15:46:45,082 WARN
>>>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
>>> [transaction.timeout.ms ] not specified.
>>> Setting it to PT1H*
>>>  and until
>>>
>>> *2023-06-21 15:53:20,002 INFO
>>>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
>>> Checkpointing. Checkpointing is not supported and not needed when executing
>>> jobs in BATCH mode.*(logs in attachment)
>>>
>>> When I set log.level=DEBUG, I see only this message each 10 seconds:
>>> *2023-06-21 14:51:30,921 DEBUG
>>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
>>> Trigger heartbeat request.*
>>>
>>> Please, could you help me understand the cause of this problem and how
>>> to fix it. I use the Flink 1.15.3 version.
>>>
>>> Thank you in advance!
>>>
>>> Best regards,
>>> Brendan Cortez.
>>>
>>


Re: Very long launch of the Flink application in BATCH mode

2023-06-26 Thread Brendan Cortez
No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.

On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
wrote:

> Hey Brendan,
>
> Do you use a file source by any chance?
>
> On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
> brendan.cortez...@gmail.com> wrote:
>
>> Hi all!
>>
>> I'm trying to submit a Flink Job in Application Mode in the Kubernetes
>> cluster.
>>
>> I see some problems when an application has a big number of operators
>> (more than 20 same operators) - it freezes for ~6 minutes after
>> *2023-06-21 15:46:45,082 WARN
>>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
>> [transaction.timeout.ms ] not specified.
>> Setting it to PT1H*
>>  and until
>>
>> *2023-06-21 15:53:20,002 INFO
>>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
>> Checkpointing. Checkpointing is not supported and not needed when executing
>> jobs in BATCH mode.*(logs in attachment)
>>
>> When I set log.level=DEBUG, I see only this message each 10 seconds:
>> *2023-06-21 14:51:30,921 DEBUG
>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
>> Trigger heartbeat request.*
>>
>> Please, could you help me understand the cause of this problem and how to
>> fix it. I use the Flink 1.15.3 version.
>>
>> Thank you in advance!
>>
>> Best regards,
>> Brendan Cortez.
>>
>


Re: Very long launch of the Flink application in BATCH mode

2023-06-26 Thread Yaroslav Tkachenko
Hey Brendan,

Do you use a file source by any chance?

On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez 
wrote:

> Hi all!
>
> I'm trying to submit a Flink Job in Application Mode in the Kubernetes
> cluster.
>
> I see some problems when an application has a big number of operators
> (more than 20 same operators) - it freezes for ~6 minutes after
> *2023-06-21 15:46:45,082 WARN
>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
> [transaction.timeout.ms ] not specified.
> Setting it to PT1H*
>  and until
>
> *2023-06-21 15:53:20,002 INFO
>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
> Checkpointing. Checkpointing is not supported and not needed when executing
> jobs in BATCH mode.*(logs in attachment)
>
> When I set log.level=DEBUG, I see only this message each 10 seconds:
> *2023-06-21 14:51:30,921 DEBUG
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
> Trigger heartbeat request.*
>
> Please, could you help me understand the cause of this problem and how to
> fix it. I use the Flink 1.15.3 version.
>
> Thank you in advance!
>
> Best regards,
> Brendan Cortez.
>


Very long launch of the Flink application in BATCH mode

2023-06-26 Thread Brendan Cortez
Hi all!

I'm trying to submit a Flink Job in Application Mode in the Kubernetes
cluster.

I see some problems when an application has a big number of operators (more
than 20 same operators) - it freezes for ~6 minutes after
*2023-06-21 15:46:45,082 WARN
 org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
[transaction.timeout.ms ] not specified.
Setting it to PT1H*
 and until

*2023-06-21 15:53:20,002 INFO
 org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
Checkpointing. Checkpointing is not supported and not needed when executing
jobs in BATCH mode.*(logs in attachment)

When I set log.level=DEBUG, I see only this message each 10 seconds:
*2023-06-21 14:51:30,921 DEBUG
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
Trigger heartbeat request.*

Please, could you help me understand the cause of this problem and how to
fix it. I use the Flink 1.15.3 version.

Thank you in advance!

Best regards,
Brendan Cortez.


flink-k8s-app.log
Description: Binary data