[jira] [Commented] (FLINK-14048) Flink client hangs after trying to kill Yarn Job during deployment

2019-10-23 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957691#comment-16957691
 ] 

Zili Chen commented on FLINK-14048:
---

Thanks for your update [~gyfora]. I think you're right. I find another earlier 
report FLINK-10435. Closed this one as duplicated. FLINK-10435 has detailed 
message.

> Flink client hangs after trying to kill Yarn Job during deployment
> --
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Gyula Fora
>Priority: Major
> Attachments: patch.diff
>
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14048) Flink client hangs after trying to kill Yarn Job during deployment

2019-10-23 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957670#comment-16957670
 ] 

Gyula Fora commented on FLINK-14048:


[~tison] I think your patch doesn't fix the problem. It looks like a bug in the 
AbstractYarnClusterDescriptor when it tries to kill the already failed app.

 

19/10/23 01:50:50 INFO yarn.AbstractYarnClusterDescriptor: Cancelling 
deployment from Deployment Failure Hook
19/10/23 01:50:50 INFO yarn.AbstractYarnClusterDescriptor: Killing YARN 
application
19/10/23 01:50:50 INFO retry.RetryInvocationHandler: java.io.IOException: The 
client is stopped, while invoking 
ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Trying to 
failover immediately.
19/10/23 01:50:50 INFO retry.RetryInvocationHandler: java.io.IOException: The 
client is stopped, while invoking 
ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 1 
failover attempts. Trying to failover after sleeping for 40495ms.

> Flink client hangs after trying to kill Yarn Job during deployment
> --
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Gyula Fora
>Priority: Major
> Attachments: patch.diff
>
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14048) Flink client hangs after trying to kill Yarn Job during deployment

2019-09-11 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927372#comment-16927372
 ] 

TisonKun commented on FLINK-14048:
--

[~gyfora] also it looks like a duplication of FLINK-13895. Could you please 
check if the root cause of two issues is the same?

> Flink client hangs after trying to kill Yarn Job during deployment
> --
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Gyula Fora
>Priority: Major
> Attachments: patch.diff
>
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-14048) Flink client hangs after trying to kill Yarn Job during deployment

2019-09-11 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927350#comment-16927350
 ] 

TisonKun commented on FLINK-14048:
--

I try to refactor the code for a proper exception handling. Could you apply the 
patch attached to see if the issue addressed?

> Flink client hangs after trying to kill Yarn Job during deployment
> --
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Gyula Fora
>Priority: Major
> Attachments: patch.diff
>
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-14048) Flink client hangs after trying to kill Yarn Job during deployment

2019-09-11 Thread Gyula Fora (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927313#comment-16927313
 ] 

Gyula Fora commented on FLINK-14048:


yes it was in per-job mode

> Flink client hangs after trying to kill Yarn Job during deployment
> --
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Gyula Fora
>Priority: Major
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-14048) Flink client hangs after trying to kill Yarn Job during deployment

2019-09-11 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927308#comment-16927308
 ] 

TisonKun commented on FLINK-14048:
--

[~gyfora] did you notice this problem when deploy per-job cluster? I find the 
relevant code snippet in {{CliFrontend#runProgram}} and it seems that when 
exception thrown(in this case, a signal cause exception) we don't close the 
{{ClusterClient}} properly. But it should only happen in per-job mode.

> Flink client hangs after trying to kill Yarn Job during deployment
> --
>
> Key: FLINK-14048
> URL: https://issues.apache.org/jira/browse/FLINK-14048
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Gyula Fora
>Priority: Major
>
> If we kill the flink client run command from the terminal while deploying to 
> YARN (let's say we realize we used the wrong parameters), the YARN 
> application will be killed immediately but the client won't shut down.
> We get the following messages over and over:
> 19/09/10 23:35:55 INFO retry.RetryInvocationHandler: java.io.IOException: The 
> client is stopped, while invoking 
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 14 
> failover attempts. Trying to failover after sleeping for 16296ms.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)