[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2018-05-27 Thread eliaslevy
Github user eliaslevy commented on the issue:

https://github.com/apache/flink/pull/3334
  
Any chance this will be merged now that 1.5 is out?


---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-09 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
@StephanEwen 
No problem. I appreciate your time and efforts. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-09 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3334
  
@ramkrish86 I would like to get to this one here after the additions to the 
checkpoint coordinator I am currently working on are done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-08 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
@StephanEwen 
I saw in another JIRA one of your comment where you talked about 
refactoring CheckPointcoordinator and Pendingcheckpoint. So you woud this PR to 
wait till then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-06 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
Just updated and did a force push to avoid the merge commit. Now things are 
fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-03 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
Ping for reviews here!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-03 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
@StephanEwen , @wenlong88 , @shixiaogang 
Pls have a look at the latest push. Now I am tracking the failures in the 
checkpointing and incrementing  a new counter based on it. Added test cases 
also. 
I have not changed the constructors of the affected class because it 
touches many files. I can update it based on the feedback of the latest PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-02 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
I thinkI got a better way to trck this. Will update the PR sooner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-01 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
Thanks for the input. I read the code. There are two ways a checkpoint 
fails (as per my code understanding). If for some reason checkpointing cannot 
be performed we send DeclineCheckpoint message. That is handled by the 
Checkpointcoordinator.
Another is if there is an external error in checkpointing and in that case 
we call failExternally. Which transitions the state to FAILED and closes all 
the watchdog, and cancels the invokable also. Now is the intent to track how 
many times this happens and if so track such occurences of failure and then 
fail the execution graph?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-01 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
I think I got what you are saying here. Since Execution#triggerCheckpoint 
is the actual checkpoint call and currently we don't track it if there is a 
failure. So your point is it is better know if there was a failure in actual 
checkpoint triggering at the Task level and then count that as a failure. Am I 
right @wenlong88 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-03-01 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
@wenlong88 
Can you tell more when you say checkpointing failure and trigger failure? I 
think if you are saying about tracking the number of times the execution fails 
after restoring from a checkpoint I think FLINK-4815 is trying to focus that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-02-28 Thread wenlong88
Github user wenlong88 commented on the issue:

https://github.com/apache/flink/pull/3334
  
Currently the `numUnsuccessfulCheckpointsTriggers` will be reset after a 
successful trigger instead of a successful checkpoint. But I think it is rare 
actually for triggering failure and monitoring checkpoint failure is more 
valuable. What do you guys think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-02-27 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3334
  
@StephanEwen - Ping for initial reviews. Will work on it based on the 
feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3334: FLINK-4810 Checkpoint Coordinator should fail ExecutionGr...

2017-02-17 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3334
  
Thank you for opening this pull request.
I'll try to review it in the coming days...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---