[jira] [Updated] (FLINK-14434) Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner

2019-10-21 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-14434:
--
Affects Version/s: 1.8.2
   1.9.1

> Dispatcher#createJobManagerRunner should returns on creation succeed, not 
> after startJobManagerRunner
> -
>
> Key: FLINK-14434
> URL: https://issues.apache.org/jira/browse/FLINK-14434
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.2, 1.10.0, 1.9.1
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
> Attachments: patch.diff
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In an edge case, let's said
> 1) job finished nearly immediately
> 2) Dispatcher has been suspended in {{#startJobManagerRunner}} after 
> {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}}
> due to
> 1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} 
> finished.
> 2) the creation of JobManagerRunner doesn't happen in MainThread.
> it is a possible execution order
> 1) JobManagerRunner created in akka-dispatcher thread
> 2) then apply {{Dispatcher#startJobManagerRunner}}
> 3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}}
> 4) this thread suspended
> 5) job finished, execute callback on MainThread
> 6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} 
> because akka-dispatcher thread doesn't {{return jobManagerRunner;}}
> 7) it report {{There is a newer JobManagerRunner for the job}} but actually 
> not.
> **Solution**
> Two perspective but we can even have them both.
> 1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let 
> {{#startJobManagerRunner}} an action
> 2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in 
> MainThread.
> CC [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14434) Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-14434:
---
Labels: pull-request-available  (was: )

> Dispatcher#createJobManagerRunner should returns on creation succeed, not 
> after startJobManagerRunner
> -
>
> Key: FLINK-14434
> URL: https://issues.apache.org/jira/browse/FLINK-14434
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
> Attachments: patch.diff
>
>
> In an edge case, let's said
> 1) job finished nearly immediately
> 2) Dispatcher has been suspended in {{#startJobManagerRunner}} after 
> {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}}
> due to
> 1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} 
> finished.
> 2) the creation of JobManagerRunner doesn't happen in MainThread.
> it is a possible execution order
> 1) JobManagerRunner created in akka-dispatcher thread
> 2) then apply {{Dispatcher#startJobManagerRunner}}
> 3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}}
> 4) this thread suspended
> 5) job finished, execute callback on MainThread
> 6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} 
> because akka-dispatcher thread doesn't {{return jobManagerRunner;}}
> 7) it report {{There is a newer JobManagerRunner for the job}} but actually 
> not.
> **Solution**
> Two perspective but we can even have them both.
> 1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let 
> {{#startJobManagerRunner}} an action
> 2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in 
> MainThread.
> CC [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14434) Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner

2019-10-17 Thread Zili Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zili Chen updated FLINK-14434:
--
Attachment: patch.diff

> Dispatcher#createJobManagerRunner should returns on creation succeed, not 
> after startJobManagerRunner
> -
>
> Key: FLINK-14434
> URL: https://issues.apache.org/jira/browse/FLINK-14434
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: patch.diff
>
>
> In an edge case, let's said
> 1) job finished nearly immediately
> 2) Dispatcher has been suspended in {{#startJobManagerRunner}} after 
> {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}}
> due to
> 1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} 
> finished.
> 2) the creation of JobManagerRunner doesn't happen in MainThread.
> it is a possible execution order
> 1) JobManagerRunner created in akka-dispatcher thread
> 2) then apply {{Dispatcher#startJobManagerRunner}}
> 3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}}
> 4) this thread suspended
> 5) job finished, execute callback on MainThread
> 6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} 
> because akka-dispatcher thread doesn't {{return jobManagerRunner;}}
> 7) it report {{There is a newer JobManagerRunner for the job}} but actually 
> not.
> **Solution**
> Two perspective but we can even have them both.
> 1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let 
> {{#startJobManagerRunner}} an action
> 2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in 
> MainThread.
> CC [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)