[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-18 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446303#comment-17446303
 ] 

Luke Chen commented on KAFKA-13370:
---

[~rhauch] , PR: [https://github.com/apache/kafka/pull/11509] is created to add 
unit tests for offset-commit metrics. Thank you.

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1, 2.8.2
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444638#comment-17444638
 ] 

Randall Hauch commented on KAFKA-13370:
---

I reverted the change (https://github.com/apache/kafka/pull/9642) that caused 
this regression in the following branches:
* `2.8` for inclusion in a future 2.8.2 release
* `3.0` for inclusion in a future 3.0.1 release
* `3.1` for inclusion in the upcoming 3.1.0 release
* `trunk` for inclusion in the next major/minor release (e.g., 3.2.0 or 4.0.0)

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1, 2.8.2
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444615#comment-17444615
 ] 

Randall Hauch commented on KAFKA-13370:
---

Thanks, [~showuon], creating the PR that fixes this issue. But, I think the 
best course of action here is actually to revert #9642, for a few reasons:

# It was actually useful to have that boolean success parameter within the 
protected TaskMetricsGroup.recordCommit(...) helper method, since it allowed 
the WorkerTask.recordCommitSuccess(...) and recordCommitFailure(...) methods to 
call that one method with desired behavior.
# Rolling back is also more encapsulated and significantly easier to backport 
all the way back to the 2.8 branch, especially since we've significantly 
refactored the source connector offset commit logic only in 3.0 and later 
branches.

So I think I'm going to close your PR and revert #9642.

However, I think your unit test improvement to verify the expected metric 
values are very useful. Would you mind creating a new PR with those unit test 
improvements? Since this is a blocker issue for 3.1, it'd be great to do that 
quickly. If that's not feasible (or I don't hear back in the next few days), 
then let's create a new issue for the additional unit test metric verification 
and associate the PR with that new issue.

Thanks!

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-01 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436624#comment-17436624
 ] 

Luke Chen commented on KAFKA-13370:
---

Set this issue as blocker for v3.0.1/v3.1.0 since it's a regression issue.

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-10-20 Thread Vincent Giroux (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431272#comment-17431272
 ] 

Vincent Giroux commented on KAFKA-13370:


Thank you [~showuon] for tackling this issue so quickly !

The changes look good to me. 

P.S. It is also great to see you modified the unit tests so that regression 
about how these metrics are calculated is less likely to occur in the future.

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Minor
> Fix For: 2.8.0
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-10-19 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430429#comment-17430429
 ] 

Luke Chen commented on KAFKA-13370:
---

[~20100g], thanks for reporting the issue. PR is ready now: 
[https://github.com/apache/kafka/pull/11413]

Welcome to provide comments. Thanks.

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Minor
> Fix For: 2.8.0
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)