I see, I will check tm log.
Thank you Arvid.

Best regards
Rainie

On Wed, Feb 24, 2021 at 5:27 AM Arvid Heise <ar...@apache.org> wrote:

> Hi Rainie,
>
> there are two probably causes:
> * Network instabilities
> * Taskmanager died, then you can further dig in the taskmanager logs for
> errors right before that time.
>
> In both cases, Flink should restart the job with the correct restart
> policies if configured.
>
> On Sat, Feb 20, 2021 at 10:07 PM Rainie Li <raini...@pinterest.com> wrote:
>
>> Hello,
>>
>> I launched a job with a larger load on hadoop yarn cluster.
>> The Job finished after running 5 hours, I didn't find any error from
>> JobManger log besides this connect exception.
>>
>>
>>
>>
>>
>> *2021-02-20 13:20:14,110 WARN  akka.remote.transport.netty.NettyTransport
>>                    - Remote connection to [/10.1.57.146:48368
>> <http://10.1.57.146:48368>] failed with java.io.IOException: Connection
>> reset by peer2021-02-20 13:20:14,110 WARN
>>  akka.remote.ReliableDeliverySupervisor                        -
>> Association with remote system [akka.tcp://flink-metrics@host:35241] has
>> failed, address is now gated for [50] ms. Reason: [Disassociated]
>> 2021-02-20 13:20:14,110 WARN  akka.remote.ReliableDeliverySupervisor
>>                  - Association with remote system
>> [akka.tcp://flink@host:39493] has failed, address is now gated for [50] ms.
>> Reason: [Disassociated] 2021-02-20 13:20:14,110 WARN
>>  akka.remote.ReliableDeliverySupervisor                        -
>> Association with remote system [akka.tcp://flink-metrics@host:38481] has
>> failed, address is now gated for [50] ms. Reason: [Disassociated] *
>>
>> Any idea what caused the job to be finished and how to resolve it?
>> Any suggestions are appreciated.
>>
>> Thanks
>> Best regards
>> Rainie
>>
>

Reply via email to