Hi
After 1.14.0 I think Flink should work well even at the 1000*1000 scale +
10s akka.timeout in the deploy stage.
So thank you for any further feedback after you investigate.

BTW: I think you might look at
https://issues.apache.org/jira/browse/FLINK-24295, which might cause the
problem.

Best,
Guowei



On Mon, Jan 24, 2022 at 4:31 PM Paul Lam <paullin3...@gmail.com> wrote:

> Hi Guowei,
>
> Thanks a lot for your reply.
>
> I’m using 1.14.0. The timeout happens at job deployment time. A subtask
> would run for a short period of `akka.ask.timeout` before fails due to the
> timeout.
>
> I noticed that jobmanager have a very hight CPU usage at the moment, like
> 2000%. I’m reasoning about the cause by profiling.
>
> Best,
> Paul Lam
>
> 2022年1月21日 09:56,Guowei Ma <guowei....@gmail.com> 写道:
>
> Hi, Paul
>
> Would you like to share some information such as the Flink version you
> used and the memory of TM and JM.
> And when does the timeout happen? Such as at begin of the job or during
> the running of the job
>
> Best,
> Guowei
>
>
> On Thu, Jan 20, 2022 at 4:45 PM Paul Lam <paullin3...@gmail.com> wrote:
>
>> Hi,
>>
>> I’m tuning a Flink job with 1000+ parallelism, which frequently fails
>> with Akka TimeOutException (it was fine with 200 parallelism).
>>
>> I see some posts recommend increasing `akka.ask.timeout` to 120s. I’m not
>> familiar with Akka but it looks like a very long time compared to the
>> default 10s and as a response timeout.
>>
>> So I’m wondering what’s the reasonable range for this option? And why
>> would the Actor fail to respond in time (the message was dropped due to
>> pressure)?
>>
>> Any input would be appreciated! Thanks a lot.
>>
>> Best,
>> Paul Lam
>>
>>
>

Reply via email to