[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined

2020-08-20 Thread GitBox


pnowojski commented on a change in pull request #13180:
URL: https://github.com/apache/flink/pull/13180#discussion_r473803693



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/AsyncCheckpointRunnable.java
##
@@ -129,12 +129,10 @@ public void run() {
checkpointMetaData.getCheckpointId());
}
} catch (Exception e) {
-   if (LOG.isDebugEnabled()) {
-   LOG.debug("{} - asynchronous part of checkpoint 
{} could not be completed.",
-   taskName,
-   checkpointMetaData.getCheckpointId(),
-   e);
-   }
+   LOG.info("{} - asynchronous part of checkpoint {} could 
not be completed.",

Review comment:
   > Honestly, I have not had this to me, I’m not against this change here.
   
   Ok, in that case let's not overthink it and let's try this out :) Thanks for 
your inputs @NicoK and @klion26 .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined

2020-08-18 Thread GitBox


pnowojski commented on a change in pull request #13180:
URL: https://github.com/apache/flink/pull/13180#discussion_r472352387



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/AsyncCheckpointRunnable.java
##
@@ -129,12 +129,10 @@ public void run() {
checkpointMetaData.getCheckpointId());
}
} catch (Exception e) {
-   if (LOG.isDebugEnabled()) {
-   LOG.debug("{} - asynchronous part of checkpoint 
{} could not be completed.",
-   taskName,
-   checkpointMetaData.getCheckpointId(),
-   e);
-   }
+   LOG.info("{} - asynchronous part of checkpoint {} could 
not be completed.",

Review comment:
   Won't this flood log with exceptions in some cases?
   
   There was some [relevant 
discussion](https://github.com/apache/flink/pull/9873/files/d62e6daf023f440f25fa2ea5a55081e513a5fcd0#diff-9437cb4899bb946f234c272be6724770)
 about this when it was being introduced.
   
   CC @klion26 ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined

2020-08-18 Thread GitBox


pnowojski commented on a change in pull request #13180:
URL: https://github.com/apache/flink/pull/13180#discussion_r472234731



##
File path: flink-end-to-end-tests/test-scripts/common.sh
##
@@ -387,6 +387,7 @@ function check_logs_for_exceptions {
| grep -v  "WARN  
org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline" \
| grep -v 'INFO.*AWSErrorCode' \
| grep -v "RejectedExecutionException" \
+   | grep -v "CancellationException" \

Review comment:
   In that case can you squash this with the commit that introduced the 
problem?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined

2020-08-18 Thread GitBox


pnowojski commented on a change in pull request #13180:
URL: https://github.com/apache/flink/pull/13180#discussion_r472234400



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##
@@ -887,7 +887,8 @@ public void receiveDeclineMessage(DeclineCheckpoint 
message, String taskManagerL
checkpointId,
message.getTaskExecutionId(),
job,
-   taskManagerLocationInfo);
+   taskManagerLocationInfo,
+   message.getReason());

Review comment:
   Good to know, I was not aware of that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on a change in pull request #13180: [FLINK-18962][checkpointing] Improve logging when checkpoint declined

2020-08-18 Thread GitBox


pnowojski commented on a change in pull request #13180:
URL: https://github.com/apache/flink/pull/13180#discussion_r47214



##
File path: flink-end-to-end-tests/test-scripts/common.sh
##
@@ -387,6 +387,7 @@ function check_logs_for_exceptions {
| grep -v  "WARN  
org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline" \
| grep -v 'INFO.*AWSErrorCode' \
| grep -v "RejectedExecutionException" \
+   | grep -v "CancellationException" \

Review comment:
   Why has this had to be added? Is it caused by one of your change?

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##
@@ -887,7 +887,8 @@ public void receiveDeclineMessage(DeclineCheckpoint 
message, String taskManagerL
checkpointId,
message.getTaskExecutionId(),
job,
-   taskManagerLocationInfo);
+   taskManagerLocationInfo,
+   message.getReason());

Review comment:
   Isn't this is missing a pattern/format change? Also how would you like 
it to be logged? Just the `message.getReason().toString()`? Do we care about 
the stack trace?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org