[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-22 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979854#comment-15979854
 ] 

mingleizhang commented on FLINK-6319:
-

+1
I have to say, the method of {code}shutdownNow{code}has it's limitation. 

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in 
> https://issues.apache.org/jira/browse/FLINK-4973?focusedCommentId=15965884=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15965884).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-21 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978427#comment-15978427
 ] 

Till Rohrmann commented on FLINK-6319:
--

The YARN tests don't cancel jobs, as far as I know. Let me check with the user 
where he has seen the failing test case and check whether he can share the logs 
with me.

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in 
> https://issues.apache.org/jira/browse/FLINK-4973?focusedCommentId=15965884=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15965884).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977101#comment-15977101
 ] 

Stephan Ewen commented on FLINK-6319:
-

On clean shutdown, these errors should not occur. Do the YARN tests run a job 
and trigger a "cancel" and expect the logs to be error free?

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in 
> https://issues.apache.org/jira/browse/FLINK-4973?focusedCommentId=15965884=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15965884).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-20 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976664#comment-15976664
 ] 

Till Rohrmann commented on FLINK-6319:
--

Yes I think it is a bit of both. The error is logged in the timer task as a 
warning and due to the way the Yarn tests are written this causes a test 
failure. I think at the moment we're logging the exception in the timer task 
because it's not fatal and thus we don't want to pass it to the 
{{SystemProcessingTimeService's}} {{AsyncExceptionHandler}} which would fail 
the underlying {{Task}}.

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in 
> https://issues.apache.org/jira/browse/FLINK-4973?focusedCommentId=15965884=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15965884).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976628#comment-15976628
 ] 

Stephan Ewen commented on FLINK-6319:
-

Under successful shutdown, this happens anyways, because we first call 
{{quiesceAndAwaitPending()}}, which waits for in-flight timers.
The {{shutdownNow()}} only comes in cancellation / failure situations, where no 
correctness guarantees are given. It does matter, though, to shut down as fast 
as possible. That was the initial thinking.

I think the fact that the {{LocalBufferPool}} is destroyed before the latency 
marker emission timer task has been completed should not matter on cancellation.
If the issue is about polluted logs, then my take is that the logging is in the 
wrong place - it is in a place unaware of the context (does the error mean 
something or not).

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in 
> https://issues.apache.org/jira/browse/FLINK-4973?focusedCommentId=15965884=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15965884).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-19 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974260#comment-15974260
 ] 

Till Rohrmann commented on FLINK-6319:
--

This would then mean that we cannot guarantee that the {{LocalBufferPool}} is 
only shut down after the latency marker emission timer task has been completed 
(even with a small timeout this could still happen, but is much less likely).

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in 
> https://issues.apache.org/jira/browse/FLINK-4973?focusedCommentId=15965884=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15965884).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6319) Add timeout when shutting SystemProcessingTimeService down

2017-04-18 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972775#comment-15972775
 ] 

Aljoscha Krettek commented on FLINK-6319:
-

I think [~StephanEwen] had some thought's on why we have it like that. I think 
he said that a shutdown should never be blocked on anything and that's why we 
don't wait. I might be completely wrong and misrepresenting what he said, 
though. 

> Add timeout when shutting SystemProcessingTimeService down
> --
>
> Key: FLINK-6319
> URL: https://issues.apache.org/jira/browse/FLINK-6319
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.3.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> A user noted that we simply call {{shutdownNow}} on the 
> {{SystemProcessingTimeService's}} {{ScheduledThreadpoolExecutor}} when 
> calling {{SystemProcessingTimeService.shutdownService}}. {{shutdowNow}} will 
> halt all waiting tasks but it won't wait until the currently running tasks 
> have been completed. This can lead to unwanted runtime behaviours such as 
> wrong termination orders when shutting down tasks (as reported in FLINK-4973).
> I propose to add a small timeout to wait for currently running tasks to 
> complete. Even though this problem cannot be completely solved since timer 
> tasks might take longer than the specified timeout, a timeout for waiting for 
> running tasks to complete will mitigate the problem.
> We can do this by calling {{timerServicer.awaitTermination(timeout, 
> timeoutUnit);}} after the {{shutdowNow}} call.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)