[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2023-08-17 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-13698:
---
Priority: Not a Priority  (was: Minor)

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Not a Priority
>  Labels: auto-deprioritized-critical, auto-deprioritized-major, 
> stale-minor
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2023-08-17 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13698:
---
Labels: auto-deprioritized-critical auto-deprioritized-major stale-minor  
(was: auto-deprioritized-critical auto-deprioritized-major)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is 
still Minor, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Minor
>  Labels: auto-deprioritized-critical, auto-deprioritized-major, 
> stale-minor
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-09-29 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-13698:
---
Fix Version/s: (was: 1.15.0)

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Minor
>  Labels: auto-deprioritized-critical, auto-deprioritized-major
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-09-28 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-13698:
-
Fix Version/s: (was: 1.14.0)
   1.15.0

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Minor
>  Labels: auto-deprioritized-critical, auto-deprioritized-major
> Fix For: 1.15.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-07-06 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13698:
---
  Labels: auto-deprioritized-critical auto-deprioritized-major  (was: 
auto-deprioritized-critical stale-major)
Priority: Minor  (was: Major)

This issue was labeled "stale-major" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Major, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Minor
>  Labels: auto-deprioritized-critical, auto-deprioritized-major
> Fix For: 1.14.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-06-28 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13698:
---
Labels: auto-deprioritized-critical stale-major  (was: 
auto-deprioritized-critical)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Major but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this 
ticket is a Major, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: auto-deprioritized-critical, stale-major
> Fix For: 1.14.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-05-27 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13698:
---
  Labels: auto-deprioritized-critical  (was: stale-critical)
Priority: Major  (was: Critical)

This issue was labeled "stale-critical" 7 ago and has not received any updates 
so it is being deprioritized. If this ticket is actually Critical, please raise 
the priority and ask a committer to assign you the issue or revive the public 
discussion.


> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: auto-deprioritized-critical
> Fix For: 1.14.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-05-20 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13698:
---
Labels: stale-critical  (was: )

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Critical but is unassigned and neither itself nor its Sub-Tasks have been 
updated for 7 days. I have gone ahead and marked it "stale-critical". If this 
ticket is critical, please either assign yourself or give an update. 
Afterwards, please remove the label or in 7 days the issue will be 
deprioritized.


> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Critical
>  Labels: stale-critical
> Fix For: 1.14.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-04-26 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-13698:
-
Fix Version/s: (was: 1.13.0)
   1.14.0

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Critical
> Fix For: 1.14.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2021-04-23 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-13698:
--
Priority: Critical  (was: Major)

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Priority: Critical
> Fix For: 1.13.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2020-12-07 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-13698:
---
Fix Version/s: (was: 1.12.0)
   1.13.0

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Assignee: Biao Liu
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2019-12-10 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-13698:
---
Fix Version/s: (was: 1.10.0)
   1.11.0

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Assignee: Biao Liu
>Priority: Blocker
> Fix For: 1.11.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2019-12-10 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-13698:
---
Priority: Major  (was: Blocker)

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Assignee: Biao Liu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2019-12-10 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-13698:
--
Priority: Blocker  (was: Critical)

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Assignee: Biao Liu
>Priority: Blocker
> Fix For: 1.10.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2019-12-10 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-13698:
--
Fix Version/s: 1.10.0

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Assignee: Biao Liu
>Priority: Critical
> Fix For: 1.10.0
>
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13698) Rework threading model of CheckpointCoordinator

2019-08-12 Thread Piotr Nowojski (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-13698:
---
Issue Type: Improvement  (was: Bug)

> Rework threading model of CheckpointCoordinator
> ---
>
> Key: FLINK-13698
> URL: https://issues.apache.org/jira/browse/FLINK-13698
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.10.0
>Reporter: Piotr Nowojski
>Assignee: Biao Liu
>Priority: Critical
>
> Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
> executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
> It's causing multiple concurrency issues, for example: 
> https://issues.apache.org/jira/browse/FLINK-13497
> Proper fix would be to rethink threading model there. At first glance it 
> doesn't seem that this code should be multi threaded, except of parts doing 
> the actual IO operations, so it should be possible to run everything in one 
> single ExecutionGraph's thread and just run asynchronously necessary IO 
> operations with some feedback loop ("mailbox style").
> I would strongly recommend fixing this issue before adding new features in 
> the \{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)