[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined
pnowojski commented on a change in pull request #13180: URL: https://github.com/apache/flink/pull/13180#discussion_r473803693 ## File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/AsyncCheckpointRunnable.java ## @@ -129,12 +129,10 @@ public void run() { checkpointMetaData.getCheckpointId()); } } catch (Exception e) { - if (LOG.isDebugEnabled()) { - LOG.debug("{} - asynchronous part of checkpoint {} could not be completed.", - taskName, - checkpointMetaData.getCheckpointId(), - e); - } + LOG.info("{} - asynchronous part of checkpoint {} could not be completed.", Review comment: > Honestly, I have not had this to me, I’m not against this change here. Ok, in that case let's not overthink it and let's try this out :) Thanks for your inputs @NicoK and @klion26 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined
pnowojski commented on a change in pull request #13180: URL: https://github.com/apache/flink/pull/13180#discussion_r472352387 ## File path: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/AsyncCheckpointRunnable.java ## @@ -129,12 +129,10 @@ public void run() { checkpointMetaData.getCheckpointId()); } } catch (Exception e) { - if (LOG.isDebugEnabled()) { - LOG.debug("{} - asynchronous part of checkpoint {} could not be completed.", - taskName, - checkpointMetaData.getCheckpointId(), - e); - } + LOG.info("{} - asynchronous part of checkpoint {} could not be completed.", Review comment: Won't this flood log with exceptions in some cases? There was some [relevant discussion](https://github.com/apache/flink/pull/9873/files/d62e6daf023f440f25fa2ea5a55081e513a5fcd0#diff-9437cb4899bb946f234c272be6724770) about this when it was being introduced. CC @klion26 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined
pnowojski commented on a change in pull request #13180: URL: https://github.com/apache/flink/pull/13180#discussion_r472234731 ## File path: flink-end-to-end-tests/test-scripts/common.sh ## @@ -387,6 +387,7 @@ function check_logs_for_exceptions { | grep -v "WARN org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline" \ | grep -v 'INFO.*AWSErrorCode' \ | grep -v "RejectedExecutionException" \ + | grep -v "CancellationException" \ Review comment: In that case can you squash this with the commit that introduced the problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined
pnowojski commented on a change in pull request #13180: URL: https://github.com/apache/flink/pull/13180#discussion_r472234400 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java ## @@ -887,7 +887,8 @@ public void receiveDeclineMessage(DeclineCheckpoint message, String taskManagerL checkpointId, message.getTaskExecutionId(), job, - taskManagerLocationInfo); + taskManagerLocationInfo, + message.getReason()); Review comment: Good to know, I was not aware of that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined
pnowojski commented on a change in pull request #13180: URL: https://github.com/apache/flink/pull/13180#discussion_r47214 ## File path: flink-end-to-end-tests/test-scripts/common.sh ## @@ -387,6 +387,7 @@ function check_logs_for_exceptions { | grep -v "WARN org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline" \ | grep -v 'INFO.*AWSErrorCode' \ | grep -v "RejectedExecutionException" \ + | grep -v "CancellationException" \ Review comment: Why has this had to be added? Is it caused by one of your change? ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java ## @@ -887,7 +887,8 @@ public void receiveDeclineMessage(DeclineCheckpoint message, String taskManagerL checkpointId, message.getTaskExecutionId(), job, - taskManagerLocationInfo); + taskManagerLocationInfo, + message.getReason()); Review comment: Isn't this is missing a pattern/format change? Also how would you like it to be logged? Just the `message.getReason().toString()`? Do we care about the stack trace? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org