[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1188101108 ## flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java: ## @@ -473,26 +500,50 @@ public CompletableFuture cancel(Time timeout) { @Override public CompletableFuture updateTaskExecutionState( final TaskExecutionState taskExecutionState) { -FlinkException taskExecutionException; +checkNotNull(taskExecutionState, "taskExecutionState"); +// Use the main/caller thread for all updates to make sure they are processed in order. +// (MainThreadExecutor i.e., the akka thread pool does not guarantee that) +// Only detach for a FAILED state update that is terminal and may perform io heavy labeling. +if (ExecutionState.FAILED.equals(taskExecutionState.getExecutionState())) { +return labelFailure(taskExecutionState) +.thenApplyAsync( +taskStateWithLabels -> { +try { +return doUpdateTaskExecutionState(taskStateWithLabels); +} catch (FlinkException e) { +throw new CompletionException(e); +} +}, +getMainThreadExecutor()); +} try { -checkNotNull(taskExecutionState, "taskExecutionState"); +return CompletableFuture.completedFuture( +doUpdateTaskExecutionState(taskExecutionState)); +} catch (FlinkException e) { +return FutureUtils.completedExceptionally(e); +} +} +private Acknowledge doUpdateTaskExecutionState(final TaskExecutionState taskExecutionState) +throws FlinkException { +@Nullable FlinkException taskExecutionException; +try { if (schedulerNG.updateTaskExecutionState(taskExecutionState)) { Review Comment: Looks like the main decision we have to take here is if the failure-labels going to be used by restart strategies or not -- relying on enrichment for restarts is what makes it crucial. The existing implementation was [based on the assumption](https://lists.apache.org/thread/tq8yrncg7zqtpc8ddpxrkxfpovs1wkkw) that labels are going to be used by the custom restart strategies in the future. Since we wanted them asynchronous, the less risky way was through existing async calls e.g., `JobMaster#updateTaskExecutionState`, and probably modifying the InternalFailuresListener (rather than changing SchedulerNG update state to async). Deciding the failure enrichment is crucial enough to be synchronous -- maybe part of `DefaultScheduler#restartTasksWithDelay`-- is also an option. However, decoupling failure labels completely from restart strategies sounds like a step back here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases within the ExecutionGraph that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here was that all these failures are internal and could maybe skip labeling. Davids suggestion following all the `fromSchedulerNg == false` paths could also work (need to verify) -- but happy to discuss other thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases within the ExecutionGraph that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here was that all these failures are internal and could maybe skip labeling. Davids suggestion following all the `fromSchedulerNg == false` paths could also work -- but happy to discuss other thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases within the ExecutionGraph that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here was that all these failures are internal and could maybe skip labeling -- Davids suggestion following all the `fromSchedulerNg == false` paths also seems to cover this -- but happy to discuss other thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases part of the Deployment/Execution that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here was that all these failures are internal and could maybe skip labeling -- Davids suggestion following all the `fromSchedulerNg == false` paths also seems to cover this -- but happy to discuss other thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases part of the Deployment/Execution that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here is that all failures are internal and could skip labeling -- but happy to discuss other thoughts? ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases part of the Deployment/Execution that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here is that all these failures are internal and could skip labeling -- but happy to discuss other thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases part of the Deployment/Execution that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) The intuition here is that all failures are internal and could skip labeling -- but happy to discuss alternatives -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases part of the Deployment/Execution that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases part of the Deployment/Execution that are not easily covered, e.g., [deploy](https://github.com/pgaref/flink/blob/41728f1bae85ce19f843fc18d0d80ddfe75af6b9/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L608), [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104), [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But there are still corner cases that are not easily covered, e.g., [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104) [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185674060 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java: ## @@ -778,7 +778,7 @@ private static PartitionInfo createFinishedPartitionInfo( */ @Override public void fail(Throwable t) { -processFail(t, true); +processFail(t, true, Collections.emptyMap()); Review Comment: Again this is covered with the next PR: https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR781-R782 But still there are corner cases that are not easily covered, e.g., [triggerTaskFailover](https://github.com/apache/flink/pull/22511/files#diff-c7f8fc6bbf59b935414529c76720dcafce6105e16783b918af6b06a10e247ab2R103-R104) [UpdatePartitionInfo](https://github.com/apache/flink/pull/22511/files#diff-f9e390431932be9a4505e581632815b81c1f0520e65cec7614ea4b90c8a52afcR1378) Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185672001 ## flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java: ## @@ -473,26 +500,50 @@ public CompletableFuture cancel(Time timeout) { @Override public CompletableFuture updateTaskExecutionState( final TaskExecutionState taskExecutionState) { -FlinkException taskExecutionException; +checkNotNull(taskExecutionState, "taskExecutionState"); +// Use the main/caller thread for all updates to make sure they are processed in order. +// (MainThreadExecutor i.e., the akka thread pool does not guarantee that) +// Only detach for a FAILED state update that is terminal and may perform io heavy labeling. +if (ExecutionState.FAILED.equals(taskExecutionState.getExecutionState())) { +return labelFailure(taskExecutionState) +.thenApplyAsync( +taskStateWithLabels -> { +try { +return doUpdateTaskExecutionState(taskStateWithLabels); +} catch (FlinkException e) { +throw new CompletionException(e); +} +}, +getMainThreadExecutor()); +} try { -checkNotNull(taskExecutionState, "taskExecutionState"); +return CompletableFuture.completedFuture( +doUpdateTaskExecutionState(taskExecutionState)); +} catch (FlinkException e) { +return FutureUtils.completedExceptionally(e); +} +} +private Acknowledge doUpdateTaskExecutionState(final TaskExecutionState taskExecutionState) +throws FlinkException { +@Nullable FlinkException taskExecutionException; +try { if (schedulerNG.updateTaskExecutionState(taskExecutionState)) { Review Comment: Btw, InternalFailuresListener#notifyTaskFailure is already covered as part of FLINK-31891 -- when a TM disconnection happens we need to Release Payload Slot and since the error is not fromSchedulerNg we use the internalTaskFailuresListener: https://github.com/apache/flink/pull/22511/files#diff-d535f910a10f835962b0637e12014068a9727b2152a84223fd9b1bf9c6c074d6R1665 This is not however covering the `onMissingDeploymentsOf` case David mentioned above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1185667103 ## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ErrorInfo.java: ## @@ -98,4 +112,13 @@ public String getExceptionAsString() { public long getTimestamp() { return timestamp; } + +/** + * Returns the labels associated with the exception. + * + * @return Map of exception labels + */ +public Map getLabels() { +return labels; Review Comment: Great idea, added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1184449298 ## flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/factories/DefaultJobMasterServiceFactory.java: ## @@ -122,6 +123,7 @@ private JobMasterService internalCreateJobMasterService( DefaultExecutionDeploymentReconciler::new, BlocklistUtils.loadBlocklistHandlerFactory( jobMasterConfiguration.getConfiguration()), +Collections.emptySet(), Review Comment: Right, this is a placeholder for loaded FailureEnrichers using a conf (probably from ClusterEndpoint or Dispatcher -- empty for now to facilitate testing) -- tracking this as part of FLINK-31993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1184450001 ## flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryMatcher.java: ## @@ -87,6 +107,18 @@ protected boolean matchesSafely( match = false; } +if (exceptionHistoryEntry.getLabels() == null) { +if (expectedLabels != null) { +description.appendText(" actualLabels=null"); +match = false; +} +} else if (exceptionHistoryEntry.getLabels().equals(expectedLabels)) { +description +.appendText(" actualLabels=") + .appendText(String.valueOf(exceptionHistoryEntry.getLabels())); +match = false; Review Comment: Agreed, this is now simplified -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] pgaref commented on a diff in pull request #22506: [FLINK-31890][runtime] Introduce JobMaster per-task failure enrichment/labeling
pgaref commented on code in PR #22506: URL: https://github.com/apache/flink/pull/22506#discussion_r1184449298 ## flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/factories/DefaultJobMasterServiceFactory.java: ## @@ -122,6 +123,7 @@ private JobMasterService internalCreateJobMasterService( DefaultExecutionDeploymentReconciler::new, BlocklistUtils.loadBlocklistHandlerFactory( jobMasterConfiguration.getConfiguration()), +Collections.emptySet(), Review Comment: Right, this is a placeholder for loaded FailureEnrichers using a conf (empty for now to facilitate testing) -- tracking this as part of FLINK-31993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org