[jira] [Updated] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-09 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-14974:
--
Fix Version/s: 3.5.0

> Restore backward compatibility in KafkaBasedLog
> ---
>
> Key: KAFKA-14974
> URL: https://issues.apache.org/jira/browse/KAFKA-14974
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.5.0, 3.4.1, 3.3.3
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.5.0, 3.4.1, 3.3.3, 3.6.0
>
>
> {{KafkaBasedLog}} is a widely used utility class that provides a generic 
> implementation of a shared, compacted log of records in a Kafka topic. It 
> isn't in Connect's public API, but has been used outside of Connect and we 
> try to preserve backward compatibility whenever possible. 
> https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
> void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
> change is source compatible, it isn't binary compatible. We can restore 
> backward compatibility simply by re-instating the older send methods, and 
> renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-09 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-14974:
--
Fix Version/s: 3.4.1
   3.3.3

> Restore backward compatibility in KafkaBasedLog
> ---
>
> Key: KAFKA-14974
> URL: https://issues.apache.org/jira/browse/KAFKA-14974
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.5.0, 3.4.1, 3.3.3
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.4.1, 3.3.3, 3.6.0
>
>
> {{KafkaBasedLog}} is a widely used utility class that provides a generic 
> implementation of a shared, compacted log of records in a Kafka topic. It 
> isn't in Connect's public API, but has been used outside of Connect and we 
> try to preserve backward compatibility whenever possible. 
> https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
> void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
> change is source compatible, it isn't binary compatible. We can restore 
> backward compatibility simply by re-instating the older send methods, and 
> renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-09 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-14974:
--
Fix Version/s: 3.6.0

> Restore backward compatibility in KafkaBasedLog
> ---
>
> Key: KAFKA-14974
> URL: https://issues.apache.org/jira/browse/KAFKA-14974
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.5.0, 3.4.1, 3.3.3
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.6.0
>
>
> {{KafkaBasedLog}} is a widely used utility class that provides a generic 
> implementation of a shared, compacted log of records in a Kafka topic. It 
> isn't in Connect's public API, but has been used outside of Connect and we 
> try to preserve backward compatibility whenever possible. 
> https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
> void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
> change is source compatible, it isn't binary compatible. We can restore 
> backward compatibility simply by re-instating the older send methods, and 
> renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-08 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720553#comment-17720553
 ] 

Randall Hauch commented on KAFKA-14974:
---

[~yash.mayya], just to clarify:
 # #12984 changed the signatures of the then-`send(...)` methods by adding a 
return, which breaks backward compatibility for this utility class.
 # Those changes were made on `trunk` prior to the `3.5` branch ([it's in the 
`3.5` 
history](https://github.com/apache/kafka/commits/3.5?after=f9730c11b7b48a37f527a363e0c6dced53fdbc69+314=3.5_name=refs%2Fheads%2F3.5)),
 and backported to the `3.4` and `3.3` branches
 # To restore backward compatibility, this PR renames those methods that return 
a `Future` as `sendWithReceipt(...)` and adds back the two `send(...)` methods 
that have the same signature as before

Is this correct?

> Restore backward compatibility in KafkaBasedLog
> ---
>
> Key: KAFKA-14974
> URL: https://issues.apache.org/jira/browse/KAFKA-14974
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.5.0, 3.4.1, 3.3.3
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
>
> {{KafkaBasedLog}} is a widely used utility class that provides a generic 
> implementation of a shared, compacted log of records in a Kafka topic. It 
> isn't in Connect's public API, but has been used outside of Connect and we 
> try to preserve backward compatibility whenever possible. 
> https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
> void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
> change is source compatible, it isn't binary compatible. We can restore 
> backward compatibility simply by re-instating the older send methods, and 
> renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-08 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-14974:
--
Component/s: KafkaConnect

> Restore backward compatibility in KafkaBasedLog
> ---
>
> Key: KAFKA-14974
> URL: https://issues.apache.org/jira/browse/KAFKA-14974
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
>
> {{KafkaBasedLog}} is a widely used utility class that provides a generic 
> implementation of a shared, compacted log of records in a Kafka topic. It 
> isn't in Connect's public API, but has been used outside of Connect and we 
> try to preserve backward compatibility whenever possible. 
> https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
> void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
> change is source compatible, it isn't binary compatible. We can restore 
> backward compatibility simply by re-instating the older send methods, and 
> renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14974) Restore backward compatibility in KafkaBasedLog

2023-05-08 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720546#comment-17720546
 ] 

Randall Hauch commented on KAFKA-14974:
---

Thanks for catching this and providing a fix. Indeed, we have tried to maintain 
backward compatibility for this class since it is super useful.

> Restore backward compatibility in KafkaBasedLog
> ---
>
> Key: KAFKA-14974
> URL: https://issues.apache.org/jira/browse/KAFKA-14974
> Project: Kafka
>  Issue Type: Task
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
>
> {{KafkaBasedLog}} is a widely used utility class that provides a generic 
> implementation of a shared, compacted log of records in a Kafka topic. It 
> isn't in Connect's public API, but has been used outside of Connect and we 
> try to preserve backward compatibility whenever possible. 
> https://issues.apache.org/jira/browse/KAFKA-14455 modified the two overloaded 
> void {{KafkaBasedLog::send}} methods to return a {{{}Future{}}}. While this 
> change is source compatible, it isn't binary compatible. We can restore 
> backward compatibility simply by re-instating the older send methods, and 
> renaming the new Future returning send methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14079) Source task will not commit offsets and develops memory leak if "error.tolerance" is set to "all"

2022-07-26 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571601#comment-17571601
 ] 

Randall Hauch commented on KAFKA-14079:
---

Merged to the `3.3` branch with permission from the 3.3 release manager.

> Source task will not commit offsets and develops memory leak if 
> "error.tolerance" is set to "all"
> -
>
> Key: KAFKA-14079
> URL: https://issues.apache.org/jira/browse/KAFKA-14079
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.2.0
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.4.0
>
>
> KAFKA-13348 added the ability to ignore producer exceptions by setting 
> {{error.tolerance}} to {{{}all{}}}.  When this is set to all a null record 
> metadata is passed to commitRecord() and the task continues.
> The issue is that records are tracked by {{SubmittedRecords}} and the first 
> time an error happens the code does not ack the record with the error and 
> just skips it so it will not have the offsets committed or be removed from 
> SubmittedRecords before calling commitRecord(). 
> This leads to a bug where future offsets won't be committed anymore and also 
> a memory leak because the algorithm that removes acked records from the 
> internal map to commit offsets [looks 
> |https://github.com/apache/kafka/blob/3.2.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java#L177]
>  at the head of the Deque where the records are tracked in and if it sees the 
> record is unacked it will not process anymore removals. This leads to all new 
> records that go through the task to continue to be added and not have offsets 
> committed and never removed from tracking until an OOM error occurs.
> The fix is to make sure to ack the failed records so they can have their 
> offsets commited and be removed from tracking. This is fine to do as the 
> records are intended to be skipped and not reprocessed. Metrics also need to 
> be updated as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14079) Source task will not commit offsets and develops memory leak if "error.tolerance" is set to "all"

2022-07-26 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-14079:
--
Fix Version/s: 3.4.0

> Source task will not commit offsets and develops memory leak if 
> "error.tolerance" is set to "all"
> -
>
> Key: KAFKA-14079
> URL: https://issues.apache.org/jira/browse/KAFKA-14079
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.2.0
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.4.0
>
>
> KAFKA-13348 added the ability to ignore producer exceptions by setting 
> {{error.tolerance}} to {{{}all{}}}.  When this is set to all a null record 
> metadata is passed to commitRecord() and the task continues.
> The issue is that records are tracked by {{SubmittedRecords}} and the first 
> time an error happens the code does not ack the record with the error and 
> just skips it so it will not have the offsets committed or be removed from 
> SubmittedRecords before calling commitRecord(). 
> This leads to a bug where future offsets won't be committed anymore and also 
> a memory leak because the algorithm that removes acked records from the 
> internal map to commit offsets [looks 
> |https://github.com/apache/kafka/blob/3.2.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java#L177]
>  at the head of the Deque where the records are tracked in and if it sees the 
> record is unacked it will not process anymore removals. This leads to all new 
> records that go through the task to continue to be added and not have offsets 
> committed and never removed from tracking until an OOM error occurs.
> The fix is to make sure to ack the failed records so they can have their 
> offsets commited and be removed from tracking. This is fine to do as the 
> records are intended to be skipped and not reprocessed. Metrics also need to 
> be updated as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14079) Source task will not commit offsets and develops memory leak if "error.tolerance" is set to "all"

2022-07-19 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568626#comment-17568626
 ] 

Randall Hauch commented on KAFKA-14079:
---

Following up with some additional detail:

This issue can affect users that are upgrading to AK 3.2.0, even if they don't 
modify any Connect worker config or connector configurations. For example, if a 
user has a pre-AK 3.2.0 Connect installation running with one or more source 
connector configurations that use {{{}error.tolerance=all{}}}, then when that 
Connect installation is upgraded to AK 3.2.0 _and_ subsequently the producer 
fails to send and ack messages generated by the source connector (e.g., message 
too large, etc.), then Connect will continue to write records to topics by will 
no longer commit source offsets for that connector. As mentioned above, Connect 
will accumulate those additional records in-memory, causing the worker to 
eventually fail with an OOM.

Unfortunately, restarting is not likely to be helpful, either: the source 
offsets are not changed/committed once this condition happens, so upon restart 
the connector will resume from the previously-committed source offsets and will 
likely regenerate the same problematic messages as before, triggering the 
problem again and causing the same OOM.

The only way to recover is to fix the underlying problem reported by the 
producer (e.g., message too large), and restart the Connect workers. Luckily 
the problems reported by the producer are captured in the worker logs.Note that 
changing the connector configuration to use {{error.tolerance=none}} will cause 
the connector to stop/fail as soon as the producer fails to write a record to 
the topic (e.g., message too large), and will not generate duplicate messages 
beyond the first problematic one (like with {{{}error.tolerance=all{}}}). But 
again, the underlying problem must be corrected before the connector can be 
restarted successfully.

This issue does not affect:
 * sink connectors;
 * source connector configurations that use {{{}error.tolerance=none{}}}, which 
is the default behavior; or
 * source connectors that never use or rely upon source offsets (a smallish 
fraction of all source connector types)

Most source connectors do rely upon source offsets, though, so this is a fairly 
serious issue.

Thanks, [~cshannon] and [~ChrisEgerton] for the quick work and review of these 
PRs. Both PRs linked above (one for the `trunk` branch and one for the `3.2` 
branch) have been merged. The `3.2` PR was merged before the first 3.2.1 RC, 
and so the AK 3.2.1 release should include this fix.

> Source task will not commit offsets and develops memory leak if 
> "error.tolerance" is set to "all"
> -
>
> Key: KAFKA-14079
> URL: https://issues.apache.org/jira/browse/KAFKA-14079
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.2.0
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
>
> KAFKA-13348 added the ability to ignore producer exceptions by setting 
> {{error.tolerance}} to {{{}all{}}}.  When this is set to all a null record 
> metadata is passed to commitRecord() and the task continues.
> The issue is that records are tracked by {{SubmittedRecords}} and the first 
> time an error happens the code does not ack the record with the error and 
> just skips it so it will not have the offsets committed or be removed from 
> SubmittedRecords before calling commitRecord(). 
> This leads to a bug where future offsets won't be committed anymore and also 
> a memory leak because the algorithm that removes acked records from the 
> internal map to commit offsets [looks 
> |https://github.com/apache/kafka/blob/3.2.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java#L177]
>  at the head of the Deque where the records are tracked in and if it sees the 
> record is unacked it will not process anymore removals. This leads to all new 
> records that go through the task to continue to be added and not have offsets 
> committed and never removed from tracking until an OOM error occurs.
> The fix is to make sure to ack the failed records so they can have their 
> offsets commited and be removed from tracking. This is fine to do as the 
> records are intended to be skipped and not reprocessed. Metrics also need to 
> be updated as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14079) Source task will not commit offsets and develops memory leak if "error.tolerance" is set to "all"

2022-07-18 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-14079.
---
  Reviewer: Randall Hauch
Resolution: Fixed

> Source task will not commit offsets and develops memory leak if 
> "error.tolerance" is set to "all"
> -
>
> Key: KAFKA-14079
> URL: https://issues.apache.org/jira/browse/KAFKA-14079
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.2.0
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
>
> KAFKA-13348 added the ability to ignore producer exceptions by setting 
> {{error.tolerance}} to {{{}all{}}}.  When this is set to all a null record 
> metadata is passed to commitRecord() and the task continues.
> The issue is that records are tracked by {{SubmittedRecords}} and the first 
> time an error happens the code does not ack the record with the error and 
> just skips it so it will not have the offsets committed or be removed from 
> SubmittedRecords before calling commitRecord(). 
> This leads to a bug where future offsets won't be committed anymore and also 
> a memory leak because the algorithm that removes acked records from the 
> internal map to commit offsets [looks 
> |https://github.com/apache/kafka/blob/3.2.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java#L177]
>  at the head of the Deque where the records are tracked in and if it sees the 
> record is unacked it will not process anymore removals. This leads to all new 
> records that go through the task to continue to be added and not have offsets 
> committed and never removed from tracking until an OOM error occurs.
> The fix is to make sure to ack the failed records so they can have their 
> offsets commited and be removed from tracking. This is fine to do as the 
> records are intended to be skipped and not reprocessed. Metrics also need to 
> be updated as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14079) Source task will not commit offsets and develops memory leak if "error.tolerance" is set to "all"

2022-07-18 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-14079:
--
Priority: Critical  (was: Major)

> Source task will not commit offsets and develops memory leak if 
> "error.tolerance" is set to "all"
> -
>
> Key: KAFKA-14079
> URL: https://issues.apache.org/jira/browse/KAFKA-14079
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.2.0
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
>
> KAFKA-13348 added the ability to ignore producer exceptions by setting 
> {{error.tolerance}} to {{{}all{}}}.  When this is set to all a null record 
> metadata is passed to commitRecord() and the task continues.
> The issue is that records are tracked by {{SubmittedRecords}} and the first 
> time an error happens the code does not ack the record with the error and 
> just skips it so it will not have the offsets committed or be removed from 
> SubmittedRecords before calling commitRecord(). 
> This leads to a bug where future offsets won't be committed anymore and also 
> a memory leak because the algorithm that removes acked records from the 
> internal map to commit offsets [looks 
> |https://github.com/apache/kafka/blob/3.2.0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java#L177]
>  at the head of the Deque where the records are tracked in and if it sees the 
> record is unacked it will not process anymore removals. This leads to all new 
> records that go through the task to continue to be added and not have offsets 
> committed and never removed from tracking until an OOM error occurs.
> The fix is to make sure to ack the failed records so they can have their 
> offsets commited and be removed from tracking. This is fine to do as the 
> records are intended to be skipped and not reprocessed. Metrics also need to 
> be updated as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-13770.
---
  Reviewer: Konstantine Karantasis
Resolution: Fixed

> Regression when Connect uses 0.10.x brokers due to recently added retry logic 
> in KafkaBasedLog
> --
>
> Key: KAFKA-13770
> URL: https://issues.apache.org/jira/browse/KAFKA-13770
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
> logic when trying to get the latest offsets for the topic as the 
> `KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
> read the latest offsets using retries.
> When Connect is using an old broker (version 0.10.x or earlier), the old 
> `KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
> the `TopicAdmin` method, and use the consumer to read offsets instead. The 
> new retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in 
> a `ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade 
> and use the consumer, and instead fails.
> The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
> `UnsupportedVersionException` rather than wrapping it. All other exceptions 
> from the admin client are either retriable or already wrapped by a 
> `ConnectException`. Therefore, it appears that `UnsupportedVersionException` 
> is the only special case here.
> KAFKA-12879 was backported to a lot of branches (tho only the revert was 
> merged to 2.5), so this new fix should be as well. It does not appear any 
> releases were made from any of those branches with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-13770:
--
Fix Version/s: (was: 2.5.2)

> Regression when Connect uses 0.10.x brokers due to recently added retry logic 
> in KafkaBasedLog
> --
>
> Key: KAFKA-13770
> URL: https://issues.apache.org/jira/browse/KAFKA-13770
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
> logic when trying to get the latest offsets for the topic as the 
> `KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
> read the latest offsets using retries.
> When Connect is using an old broker (version 0.10.x or earlier), the old 
> `KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
> the `TopicAdmin` method, and use the consumer to read offsets instead. The 
> new retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in 
> a `ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade 
> and use the consumer, and instead fails.
> The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
> `UnsupportedVersionException` rather than wrapping it. All other exceptions 
> from the admin client are either retriable or already wrapped by a 
> `ConnectException`. Therefore, it appears that `UnsupportedVersionException` 
> is the only special case here.
> KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
> well. It does not appear any releases were made from any of those branches 
> with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512192#comment-17512192
 ] 

Randall Hauch commented on KAFKA-13770:
---

Merged the changes to `trunk` after the PR builds had no Connect-related 
failures, and a [system test run of the Connect 
tests|https://jenkins.confluent.io/view/All/job/system-test-kafka-branch-builder/4823/]
 passed.

Also backported to the `3.2`, `3.1`, `3.0`, `2.8`, `2.7`, and `2.6` branches.

> Regression when Connect uses 0.10.x brokers due to recently added retry logic 
> in KafkaBasedLog
> --
>
> Key: KAFKA-13770
> URL: https://issues.apache.org/jira/browse/KAFKA-13770
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
> logic when trying to get the latest offsets for the topic as the 
> `KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
> read the latest offsets using retries.
> When Connect is using an old broker (version 0.10.x or earlier), the old 
> `KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
> the `TopicAdmin` method, and use the consumer to read offsets instead. The 
> new retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in 
> a `ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade 
> and use the consumer, and instead fails.
> The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
> `UnsupportedVersionException` rather than wrapping it. All other exceptions 
> from the admin client are either retriable or already wrapped by a 
> `ConnectException`. Therefore, it appears that `UnsupportedVersionException` 
> is the only special case here.
> KAFKA-12879 was backported to a lot of branches (tho only the revert was 
> merged to 2.5), so this new fix should be as well. It does not appear any 
> releases were made from any of those branches with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-13770:
--
Description: 
KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
logic when trying to get the latest offsets for the topic as the 
`KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
read the latest offsets using retries.

When Connect is using an old broker (version 0.10.x or earlier), the old 
`KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
the `TopicAdmin` method, and use the consumer to read offsets instead. The new 
retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in a 
`ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade and 
use the consumer, and instead fails.

The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
`UnsupportedVersionException` rather than wrapping it. All other exceptions 
from the admin client are either retriable or already wrapped by a 
`ConnectException`. Therefore, it appears that `UnsupportedVersionException` is 
the only special case here.

KAFKA-12879 was backported to a lot of branches (tho only the revert was merged 
to 2.5), so this new fix should be as well. It does not appear any releases 
were made from any of those branches with the KAFKA-12879 change.

  was:
KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
logic when trying to get the latest offsets for the topic as the 
`KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
read the latest offsets using retries.

When Connect is using an old broker (version 0.10.x or earlier), the old 
`KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
the `TopicAdmin` method, and use the consumer to read offsets instead. The new 
retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in a 
`ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade and 
use the consumer, and instead fails.

The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
`UnsupportedVersionException` rather than wrapping it. All other exceptions 
from the admin client are either retriable or already wrapped by a 
`ConnectException`. Therefore, it appears that `UnsupportedVersionException` is 
the only special case here.

KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
well. It does not appear any releases were made from any of those branches with 
the KAFKA-12879 change.


> Regression when Connect uses 0.10.x brokers due to recently added retry logic 
> in KafkaBasedLog
> --
>
> Key: KAFKA-13770
> URL: https://issues.apache.org/jira/browse/KAFKA-13770
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
> logic when trying to get the latest offsets for the topic as the 
> `KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
> read the latest offsets using retries.
> When Connect is using an old broker (version 0.10.x or earlier), the old 
> `KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
> the `TopicAdmin` method, and use the consumer to read offsets instead. The 
> new retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in 
> a `ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade 
> and use the consumer, and instead fails.
> The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
> `UnsupportedVersionException` rather than wrapping it. All other exceptions 
> from the admin client are either retriable or already wrapped by a 
> `ConnectException`. Therefore, it appears that `UnsupportedVersionException` 
> is the only special case here.
> KAFKA-12879 was backported to a lot of branches (tho only the revert was 
> merged to 2.5), so this new fix should be as well. It does not appear any 
> releases were made from any of those branches with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-13770:
-

Assignee: Randall Hauch

> Regression when Connect uses 0.10.x brokers due to recently added retry logic 
> in KafkaBasedLog
> --
>
> Key: KAFKA-13770
> URL: https://issues.apache.org/jira/browse/KAFKA-13770
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
> logic when trying to get the latest offsets for the topic as the 
> `KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
> read the latest offsets using retries.
> When Connect is using an old broker (version 0.10.x or earlier), the old 
> `KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
> the `TopicAdmin` method, and use the consumer to read offsets instead. The 
> new retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in 
> a `ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade 
> and use the consumer, and instead fails.
> The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
> `UnsupportedVersionException` rather than wrapping it. All other exceptions 
> from the admin client are either retriable or already wrapped by a 
> `ConnectException`. Therefore, it appears that `UnsupportedVersionException` 
> is the only special case here.
> KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
> well. It does not appear any releases were made from any of those branches 
> with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-13770:
--
Description: 
KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
logic when trying to get the latest offsets for the topic as the 
`KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
read the latest offsets using retries.

When Connect is using an old broker (version 0.10.x or earlier), the old 
`KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
the `TopicAdmin` method, and use the consumer to read offsets instead. The new 
retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in a 
`ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade and 
use the consumer, and instead fails.

The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
`UnsupportedVersionException` rather than wrapping it. All other exceptions 
from the admin client are either retriable or already wrapped by a 
`ConnectException`. Therefore, it appears that `UnsupportedVersionException` is 
the only special case here.

KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
well. It does not appear any releases were made from any of those branches with 
the KAFKA-12879 change.

  was:
KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
logic when trying to get the latest offsets for the topic as the 
`KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
read the latest offsets using retries.

When Connect is using an old broker (version 0.10.x or earlier), the old 
`KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
the `TopicAdmin` method, and use the consumer to read offsets instead. The new 
retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in a 
`ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade and 
use the consumer, and instead fails.

The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
`UnsupportedVersionException` rather than wrapping it. All other exceptions 
from the admin client are either retriable or already wrapped by a 
`ConnectException`. Therefore, it appears that `UnsupportedVersionException` is 
the only special case here.

KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
well. It does not appear any releases were made with the KAFKA-12879 change.


> Regression when Connect uses 0.10.x brokers due to recently added retry logic 
> in KafkaBasedLog
> --
>
> Key: KAFKA-13770
> URL: https://issues.apache.org/jira/browse/KAFKA-13770
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>Reporter: Randall Hauch
>Priority: Blocker
> Fix For: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
> logic when trying to get the latest offsets for the topic as the 
> `KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
> read the latest offsets using retries.
> When Connect is using an old broker (version 0.10.x or earlier), the old 
> `KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
> the `TopicAdmin` method, and use the consumer to read offsets instead. The 
> new retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in 
> a `ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade 
> and use the consumer, and instead fails.
> The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
> `UnsupportedVersionException` rather than wrapping it. All other exceptions 
> from the admin client are either retriable or already wrapped by a 
> `ConnectException`. Therefore, it appears that `UnsupportedVersionException` 
> is the only special case here.
> KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
> well. It does not appear any releases were made from any of those branches 
> with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13770) Regression when Connect uses 0.10.x brokers due to recently added retry logic in KafkaBasedLog

2022-03-24 Thread Randall Hauch (Jira)
Randall Hauch created KAFKA-13770:
-

 Summary: Regression when Connect uses 0.10.x brokers due to 
recently added retry logic in KafkaBasedLog
 Key: KAFKA-13770
 URL: https://issues.apache.org/jira/browse/KAFKA-13770
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
Reporter: Randall Hauch
 Fix For: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4


KAFKA-12879 recently modified Connect's `KafkaBasedLog` class to add retry 
logic when trying to get the latest offsets for the topic as the 
`KafkaBasedLog` starts up. This method calls a new method in `TopicAdmin` to 
read the latest offsets using retries.

When Connect is using an old broker (version 0.10.x or earlier), the old 
`KafkaBasedLog` logic would catch the `UnsupportedVersionException` thrown by 
the `TopicAdmin` method, and use the consumer to read offsets instead. The new 
retry logic unfortunately _wrapped_ the `UnsupportedVersionException` in a 
`ConnectException`, which means the `KafkaBasedLog` logic doesn't degrade and 
use the consumer, and instead fails.

The `TopicAdmin.retryEndOffsets(...)` method should propagate the 
`UnsupportedVersionException` rather than wrapping it. All other exceptions 
from the admin client are either retriable or already wrapped by a 
`ConnectException`. Therefore, it appears that `UnsupportedVersionException` is 
the only special case here.

KAFKA-12879 was backported to a lot of branches, so this new fix should be as 
well. It does not appear any releases were made with the KAFKA-12879 change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (KAFKA-7509) Kafka Connect logs unnecessary warnings about unused configurations

2022-03-16 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-7509:


Assignee: (was: Randall Hauch)

> Kafka Connect logs unnecessary warnings about unused configurations
> ---
>
> Key: KAFKA-7509
> URL: https://issues.apache.org/jira/browse/KAFKA-7509
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Randall Hauch
>Priority: Major
>
> When running Connect, the logs contain quite a few warnings about "The 
> configuration '{}' was supplied but isn't a known config." This occurs when 
> Connect creates producers, consumers, and admin clients, because the 
> AbstractConfig is logging unused configuration properties upon construction. 
> It's complicated by the fact that the Producer, Consumer, and AdminClient all 
> create their own AbstractConfig instances within the constructor, so we can't 
> even call its {{ignore(String key)}} method.
> See also KAFKA-6793 for a similar issue with Streams.
> There are no arguments in the Producer, Consumer, or AdminClient constructors 
> to control  whether the configs log these warnings, so a simpler workaround 
> is to only pass those configuration properties to the Producer, Consumer, and 
> AdminClient that the ProducerConfig, ConsumerConfig, and AdminClientConfig 
> configdefs know about.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-10 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504584#comment-17504584
 ] 

Randall Hauch edited comment on KAFKA-12879 at 3/10/22, 9:22 PM:
-

Update: I'm having trouble with backporting the Connect changes to the 2.5 
branch. Because this branch is so old, I'm going to just revert the AdminClient 
behavior to the 2.5 branch, and _NOT_ backport the Connect retry changes.

Note: the break due to KAFKA-12339 was never released in the 2.5 branch.


was (Author: rhauch):
Update: I'm having trouble with backporting the Connect changes to the 2.5 
branch. Because this branch is so old, I'm going to just revert the AdminClient 
behavior to the 2.5 branch, and _NOT_ backport the Connect retry changes.

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
> Fix For: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-10 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-12879.
---
  Reviewer: Randall Hauch
Resolution: Fixed

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
> Fix For: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-10 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12879:
--
Fix Version/s: 3.2.0

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
> Fix For: 2.5.2, 2.8.2, 3.2.0, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-10 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12879:
--
Fix Version/s: 3.0.2
   2.7.3
   2.6.4
   2.5.2
   2.8.2
   3.1.1

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
> Fix For: 2.5.2, 2.8.2, 3.1.1, 3.0.2, 2.7.3, 2.6.4
>
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-10 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504584#comment-17504584
 ] 

Randall Hauch commented on KAFKA-12879:
---

Update: I'm having trouble with backporting the Connect changes to the 2.5 
branch. Because this branch is so old, I'm going to just revert the AdminClient 
behavior to the 2.5 branch, and _NOT_ backport the Connect retry changes.

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-10 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503891#comment-17503891
 ] 

Randall Hauch edited comment on KAFKA-12879 at 3/10/22, 9:13 PM:
-

The approach we decided to take was to revert the previous admin client changes 
from KAFKA-12339 to bring the admin client behavior back to previous 
expectations, and to implement retries within the KafkaBasedLog to handle cases 
like those identified in that issue.

For example, a likely root cause of KAFKA-12339 was a Connect worker 
instantiates its KafkaConfigBackingStore (and other internal topic stores), 
which creates a KafkaBasedLog that as part of start() creates the topic if it 
doesn't exist and then immediately tries to read the offsets. That reading of 
offsets can fail if the metadata for the newly created topic hasn't been 
propagated to all of the brokers. We can solve this particular root cause 
easily by retrying the reading of offsets within the KafkaBasedLog's start() 
method, and since topic metadata should be propagated relatively quickly, we 
don't need to retry for that long – and most of the time we'd probably 
successfully retry within a few retries.

I've just merged to trunk a PR that does this. When trying to backport this, 
some of the newer tests were flaky, so [~pnee] created another PR (plus 
another) to hopefully eliminate that flakiness, and it seemed to work. 

I'm in the process of backporting this all the way back to 2.6 -2.5- branch, 
since that's how far back the regression from KAFKA-12339 was backported.


was (Author: rhauch):
The approach we decided to take was to revert the previous admin client changes 
from KAFKA-12339 to bring the admin client behavior back to previous 
expectations, and to implement retries within the KafkaBasedLog to handle cases 
like those identified in that issue.

For example, a likely root cause of KAFKA-12339 was a Connect worker 
instantiates its KafkaConfigBackingStore (and other internal topic stores), 
which creates a KafkaBasedLog that as part of start() creates the topic if it 
doesn't exist and then immediately tries to read the offsets. That reading of 
offsets can fail if the metadata for the newly created topic hasn't been 
propagated to all of the brokers. We can solve this particular root cause 
easily by retrying the reading of offsets within the KafkaBasedLog's start() 
method, and since topic metadata should be propagated relatively quickly, we 
don't need to retry for that long – and most of the time we'd probably 
successfully retry within a few retries.

I've just merged to trunk a PR that does this. When trying to backport this, 
some of the newer tests were flaky, so [~pnee] created another PR (plus 
another) to hopefully eliminate that flakiness, and it seemed to work. 

I'm in the process of backporting this all the way back to 2.5 branch, since 
that's how far back the regression from KAFKA-12339 was backported.

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-03-09 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503891#comment-17503891
 ] 

Randall Hauch commented on KAFKA-12879:
---

The approach we decided to take was to revert the previous admin client changes 
from KAFKA-12339 to bring the admin client behavior back to previous 
expectations, and to implement retries within the KafkaBasedLog to handle cases 
like those identified in that issue.

For example, a likely root cause of KAFKA-12339 was a Connect worker 
instantiates its KafkaConfigBackingStore (and other internal topic stores), 
which creates a KafkaBasedLog that as part of start() creates the topic if it 
doesn't exist and then immediately tries to read the offsets. That reading of 
offsets can fail if the metadata for the newly created topic hasn't been 
propagated to all of the brokers. We can solve this particular root cause 
easily by retrying the reading of offsets within the KafkaBasedLog's start() 
method, and since topic metadata should be propagated relatively quickly, we 
don't need to retry for that long – and most of the time we'd probably 
successfully retry within a few retries.

I've just merged to trunk a PR that does this. When trying to backport this, 
some of the newer tests were flaky, so [~pnee] created another PR (plus 
another) to hopefully eliminate that flakiness, and it seemed to work. 

I'm in the process of backporting this all the way back to 2.5 branch, since 
that's how far back the regression from KAFKA-12339 was backported.

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Philip Nee
>Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-01-25 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481998#comment-17481998
 ] 

Randall Hauch edited comment on KAFKA-12879 at 1/25/22, 5:52 PM:
-

The original intent of 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339]'s changes were 
to retry the `listOffsets(...)` if a retriable exception were thrown, as other 
methods within the AdminClient automatically handle retries. In hindsight, I 
should have sought clarification on that change, since 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484]
 that added `listOffsets(...)` was ambiguous about retries while 
[KIP-117|https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations]
 that added `AdminClient` included automatic retry support.

Having said that, we need to decide whether to:
1. Revert the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffsets(...)` does not retry. IMO this would leave the `AdminClient` in a 
strange state where some methods retry and others don't, with no documentation 
about which methods do and do not retry. We would also have to change the 
Connect code that uses this to perform the retries, though that's doable.
2. Keep the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffset(...)` that does retry on retriable exceptions, but throws 
`UnknownTopicOrPartitionException` when the topic does not exist (after 
successive retries) rather than the timeout exception.
3. Keep as-is and simply better document the behavior, perhaps by making an 
addendum to KIP-396.

WDYT, [~mimaison], [~cmccabe], and others?


was (Author: rhauch):
The original intent of 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339]'s changes were 
to retry the `listOffsets(...)` if a retriable exception were thrown, as other 
methods within the AdminClient automatically handle retries. In hindsight, I 
should have sought clarification on that change, since 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484]
 that added `listOffsets(...)` was ambiguous about retries while 
[KIP-117|https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations]
 that added `AdminClient` included automatic retry support.

Having said that, we need to decide whether to:
1. Revert the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffsets(...)` does not retry. IMO this would leave the `AdminClient` in a 
strange state where some methods retry and others don't, with no documentation 
about which methods do and do not retry. We would also have to change the 
Connect code that uses this to perform the retries, though that's doable.
2. Keep the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffset(...)` that does retry on retriable exceptions, but throws 
`UnknownTopicOrPartitionException` when the topic does not exist (after 
successive retries) rather than the timeout exception.
3. Keep as-is and simply better document the behavior.

I suspect option 3 is not really acceptable.

WDYT, [~mimaison], [~cmccabe], and others?

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Kirk True
>Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-12879) Compatibility break in Admin.listOffsets()

2022-01-25 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481998#comment-17481998
 ] 

Randall Hauch commented on KAFKA-12879:
---

The original intent of 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339]'s changes were 
to retry the `listOffsets(...)` if a retriable exception were thrown, as other 
methods within the AdminClient automatically handle retries. In hindsight, I 
should have sought clarification on that change, since 
[KIP-396|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484]
 that added `listOffsets(...)` was ambiguous about retries while 
[KIP-117|https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations]
 that added `AdminClient` included automatic retry support.

Having said that, we need to decide whether to:
1. Revert the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffsets(...)` does not retry. IMO this would leave the `AdminClient` in a 
strange state where some methods retry and others don't, with no documentation 
about which methods do and do not retry. We would also have to change the 
Connect code that uses this to perform the retries, though that's doable.
2. Keep the changes from 
[KAFKA-12339|https://issues.apache.org/jira/browse/KAFKA-12339] so that 
`listOffset(...)` that does retry on retriable exceptions, but throws 
`UnknownTopicOrPartitionException` when the topic does not exist (after 
successive retries) rather than the timeout exception.
3. Keep as-is and simply better document the behavior.

I suspect option 3 is not really acceptable.

WDYT, [~mimaison], [~cmccabe], and others?

> Compatibility break in Admin.listOffsets()
> --
>
> Key: KAFKA-12879
> URL: https://issues.apache.org/jira/browse/KAFKA-12879
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 2.8.0, 2.7.1, 2.6.2
>Reporter: Tom Bentley
>Assignee: Kirk True
>Priority: Major
>
> KAFKA-12339 incompatibly changed the semantics of Admin.listOffsets(). 
> Previously it would fail with {{UnknownTopicOrPartitionException}} when a 
> topic didn't exist. Now it will (eventually) fail with {{TimeoutException}}. 
> It seems this was more or less intentional, even though it would break code 
> which was expecting and handling the {{UnknownTopicOrPartitionException}}. A 
> workaround is to use {{retries=1}} and inspect the cause of the 
> {{TimeoutException}}, but this isn't really suitable for cases where the same 
> Admin client instance is being used for other calls where retries is 
> desirable.
> Furthermore as well as the intended effect on {{listOffsets()}} it seems that 
> the change could actually affect other methods of Admin.
> More generally, the Admin client API is vague about which exceptions can 
> propagate from which methods. This means that it's not possible to say, in 
> cases like this, whether the calling code _should_ have been relying on the 
> {{UnknownTopicOrPartitionException}} or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13469) End-of-life offset commit for source task can take place before all records are flushed

2021-11-30 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-13469:
--
Fix Version/s: 3.2.0

> End-of-life offset commit for source task can take place before all records 
> are flushed
> ---
>
> Key: KAFKA-13469
> URL: https://issues.apache.org/jira/browse/KAFKA-13469
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1, 3.2.0
>
>
> When we fixed KAFKA-12226, we made offset commits for source tasks take place 
> without blocking for any in-flight records to be acknowledged. While a task 
> is running, this change should yield significant benefits in some cases and 
> allow us to continue to commit offsets even when a topic partition on the 
> broker is unavailable or the producer is unable to send records to Kafka as 
> quickly as they are produced by the task.
> However, this becomes problematic when a task is scheduled for shutdown with 
> in-flight records. During shutdown, the latest committable offsets are 
> calculated, and then flushed to the offset backing store (in distributed 
> mode, this is the offsets topic). During that flush, the task's producer may 
> continue to send records to Kafka, but their offsets will not be committed, 
> which causes these records to be redelivered if/when the task is restarted.
> Essentially, duplicate records are now possible even in healthy source tasks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444638#comment-17444638
 ] 

Randall Hauch edited comment on KAFKA-13370 at 11/16/21, 5:01 PM:
--

I reverted the change (https://github.com/apache/kafka/pull/9642) that caused 
this regression in the following branches:
* `2.8` for inclusion in a future 2.8.2 release
* `3.0` for inclusion in a future 3.0.1 release
* `3.1` for inclusion in the upcoming 3.1.0 release
* `trunk` for inclusion in the next major/minor release (e.g., 3.2.0 or 4.0.0)




was (Author: rhauch):
I reverted the change (https://github.com/apache/kafka/pull/9642) that caused 
this regression in the following branches:
* `2.8` for inclusion in a future 2.8.2 release
* `3.0` for inclusion in a future 3.0.1 release
* `3.1` for inclusion in the upcoming 3.1.0 release
* `trunk` for inclusion in the next major/minor release (e.g., 3.2.0 or 4.0.0)

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1, 2.8.2
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444638#comment-17444638
 ] 

Randall Hauch commented on KAFKA-13370:
---

I reverted the change (https://github.com/apache/kafka/pull/9642) that caused 
this regression in the following branches:
* `2.8` for inclusion in a future 2.8.2 release
* `3.0` for inclusion in a future 3.0.1 release
* `3.1` for inclusion in the upcoming 3.1.0 release
* `trunk` for inclusion in the next major/minor release (e.g., 3.2.0 or 4.0.0)

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1, 2.8.2
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-13370:
--
Fix Version/s: 2.8.2

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1, 2.8.2
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444615#comment-17444615
 ] 

Randall Hauch edited comment on KAFKA-13370 at 11/16/21, 3:47 PM:
--

Thanks, [~showuon], creating the PR that fixes this issue. But, I think the 
best course of action here is actually to revert 
https://github.com/apache/kafka/pull/9642, for a few reasons:

# It was actually useful to have that boolean success parameter within the 
protected TaskMetricsGroup.recordCommit(...) helper method, since it allowed 
the WorkerTask.recordCommitSuccess(...) and recordCommitFailure(...) methods to 
call that one method with desired behavior.
# Rolling back is also more encapsulated and significantly easier to backport 
all the way back to the 2.8 branch, especially since we've significantly 
refactored the source connector offset commit logic only in 3.0 and later 
branches.

So I think I'm going to close your PR and revert #9642.

However, I think your unit test improvement to verify the expected metric 
values are very useful. Would you mind creating a new PR with those unit test 
improvements? Since this is a blocker issue for 3.1, it'd be great to do that 
quickly. If that's not feasible (or I don't hear back in the next few days), 
then let's create a new issue for the additional unit test metric verification 
and associate the PR with that new issue.

Thanks!


was (Author: rhauch):
Thanks, [~showuon], creating the PR that fixes this issue. But, I think the 
best course of action here is actually to revert #9642, for a few reasons:

# It was actually useful to have that boolean success parameter within the 
protected TaskMetricsGroup.recordCommit(...) helper method, since it allowed 
the WorkerTask.recordCommitSuccess(...) and recordCommitFailure(...) methods to 
call that one method with desired behavior.
# Rolling back is also more encapsulated and significantly easier to backport 
all the way back to the 2.8 branch, especially since we've significantly 
refactored the source connector offset commit logic only in 3.0 and later 
branches.

So I think I'm going to close your PR and revert #9642.

However, I think your unit test improvement to verify the expected metric 
values are very useful. Would you mind creating a new PR with those unit test 
improvements? Since this is a blocker issue for 3.1, it'd be great to do that 
quickly. If that's not feasible (or I don't hear back in the next few days), 
then let's create a new issue for the additional unit test metric verification 
and associate the PR with that new issue.

Thanks!

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> 

[jira] [Commented] (KAFKA-13370) Offset commit failure percentage metric is not computed correctly (regression)

2021-11-16 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444615#comment-17444615
 ] 

Randall Hauch commented on KAFKA-13370:
---

Thanks, [~showuon], creating the PR that fixes this issue. But, I think the 
best course of action here is actually to revert #9642, for a few reasons:

# It was actually useful to have that boolean success parameter within the 
protected TaskMetricsGroup.recordCommit(...) helper method, since it allowed 
the WorkerTask.recordCommitSuccess(...) and recordCommitFailure(...) methods to 
call that one method with desired behavior.
# Rolling back is also more encapsulated and significantly easier to backport 
all the way back to the 2.8 branch, especially since we've significantly 
refactored the source connector offset commit logic only in 3.0 and later 
branches.

So I think I'm going to close your PR and revert #9642.

However, I think your unit test improvement to verify the expected metric 
values are very useful. Would you mind creating a new PR with those unit test 
improvements? Since this is a blocker issue for 3.1, it'd be great to do that 
quickly. If that's not feasible (or I don't hear back in the next few days), 
then let's create a new issue for the additional unit test metric verification 
and associate the PR with that new issue.

Thanks!

> Offset commit failure percentage metric is not computed correctly (regression)
> --
>
> Key: KAFKA-13370
> URL: https://issues.apache.org/jira/browse/KAFKA-13370
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, metrics
>Affects Versions: 2.8.0
> Environment: Confluent Platform Helm Chart (v6.2.0)
>Reporter: Vincent Giroux
>Assignee: Luke Chen
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> There seems to have been a regression in the way the offset-commit-* metrics 
> are calculated for *source* Kafka Connect connectors since version 2.8.0.
> Before this version, any timeout or interruption while trying to commit 
> offsets for source connectors (e.g. MM2 MirrorSourceConnector) would get 
> correctly flagged as an offset commit failure (i.e the 
> *offset-commit-failure-percentage* metric ** would be non-zero). Since 
> version 2.8.0, these errors are considered as successes.
> After digging through the code, the commit where this bug was introduced 
> appears to be this one : 
> [https://github.com/apache/kafka/commit/047ad654da7903f3903760b0e6a6a58648ca7715]
> I believe removing the boolean *success* argument in the *recordCommit* 
> method of the *WorkerTask* class (argument deemed redundant because of the 
> presence of the Throwable *error* argument) and only considering the presence 
> of a non-null error to determine if a commit is a success or failure might be 
> a mistake. This is because in the *commitOffsets* method of the 
> *WorkerSourceTask* class, there are multiple cases where an exception object 
> is either not available or is not passed to the *recordCommitFailure* method, 
> e.g. :
>  * *TImeout #1* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L519]
>  
>  * *Timeout #2* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L584]
>  
>  * *Interruption* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L529]
>  
>  * *Unserializable offset* : 
> [https://github.com/apache/kafka/blob/2.8/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L562]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-12226) High-throughput source tasks fail to commit offsets

2021-11-15 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12226:
--
Fix Version/s: 3.0.1
   3.2.0

> High-throughput source tasks fail to commit offsets
> ---
>
> Key: KAFKA-12226
> URL: https://issues.apache.org/jira/browse/KAFKA-12226
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.2.0
>
>
> The current source task thread has the following workflow:
>  # Poll messages from the source task
>  # Queue these messages to the producer and send them to Kafka asynchronously.
>  # Add the message to outstandingMessages, or if a flush is currently active, 
> outstandingMessagesBacklog
>  # When the producer completes the send of a record, remove it from 
> outstandingMessages
> The commit offsets thread has the following workflow:
>  # Wait a flat timeout for outstandingMessages to flush completely
>  # If this times out, add all of the outstandingMessagesBacklog to the 
> outstandingMessages and reset
>  # If it succeeds, commit the source task offsets to the backing store.
>  # Retry the above on a fixed schedule
> If the source task is producing records quickly (faster than the producer can 
> send), then the producer will throttle the task thread by blocking in its 
> {{send}} method, waiting at most {{max.block.ms}} for space in the 
> {{buffer.memory}} to be available. This means that the number of records in 
> {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to 
> the size of the producer memory buffer.
> This amount of data might take more than {{offset.flush.timeout.ms}} to 
> flush, and thus the flush will never succeed while the source task is 
> rate-limited by the producer memory. This means that we may write multiple 
> hours of data to Kafka and not ever commit source offsets for the connector. 
> When the task is lost due to a worker failure, hours of data will be 
> re-processed that otherwise were successfully written to Kafka.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-12226) High-throughput source tasks fail to commit offsets

2021-11-15 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443975#comment-17443975
 ] 

Randall Hauch commented on KAFKA-12226:
---

Thanks for catching this, [~dajac]. I did miss merging this to the `3.1` branch 
-- I recall at the time looking for the branch and not seeing it. I'm in the 
process or building the branch after merging locally, so you should see this a 
bit later today.


> High-throughput source tasks fail to commit offsets
> ---
>
> Key: KAFKA-12226
> URL: https://issues.apache.org/jira/browse/KAFKA-12226
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.1.0
>
>
> The current source task thread has the following workflow:
>  # Poll messages from the source task
>  # Queue these messages to the producer and send them to Kafka asynchronously.
>  # Add the message to outstandingMessages, or if a flush is currently active, 
> outstandingMessagesBacklog
>  # When the producer completes the send of a record, remove it from 
> outstandingMessages
> The commit offsets thread has the following workflow:
>  # Wait a flat timeout for outstandingMessages to flush completely
>  # If this times out, add all of the outstandingMessagesBacklog to the 
> outstandingMessages and reset
>  # If it succeeds, commit the source task offsets to the backing store.
>  # Retry the above on a fixed schedule
> If the source task is producing records quickly (faster than the producer can 
> send), then the producer will throttle the task thread by blocking in its 
> {{send}} method, waiting at most {{max.block.ms}} for space in the 
> {{buffer.memory}} to be available. This means that the number of records in 
> {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to 
> the size of the producer memory buffer.
> This amount of data might take more than {{offset.flush.timeout.ms}} to 
> flush, and thus the flush will never succeed while the source task is 
> rate-limited by the producer memory. This means that we may write multiple 
> hours of data to Kafka and not ever commit source offsets for the connector. 
> When the task is lost due to a worker failure, hours of data will be 
> re-processed that otherwise were successfully written to Kafka.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-12226) High-throughput source tasks fail to commit offsets

2021-11-07 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440067#comment-17440067
 ] 

Randall Hauch commented on KAFKA-12226:
---

Merged the PR to the `trunk` branch, which will be included in the upcoming 
3.1.0 branch. 

> High-throughput source tasks fail to commit offsets
> ---
>
> Key: KAFKA-12226
> URL: https://issues.apache.org/jira/browse/KAFKA-12226
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.1.0
>
>
> The current source task thread has the following workflow:
>  # Poll messages from the source task
>  # Queue these messages to the producer and send them to Kafka asynchronously.
>  # Add the message to outstandingMessages, or if a flush is currently active, 
> outstandingMessagesBacklog
>  # When the producer completes the send of a record, remove it from 
> outstandingMessages
> The commit offsets thread has the following workflow:
>  # Wait a flat timeout for outstandingMessages to flush completely
>  # If this times out, add all of the outstandingMessagesBacklog to the 
> outstandingMessages and reset
>  # If it succeeds, commit the source task offsets to the backing store.
>  # Retry the above on a fixed schedule
> If the source task is producing records quickly (faster than the producer can 
> send), then the producer will throttle the task thread by blocking in its 
> {{send}} method, waiting at most {{max.block.ms}} for space in the 
> {{buffer.memory}} to be available. This means that the number of records in 
> {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to 
> the size of the producer memory buffer.
> This amount of data might take more than {{offset.flush.timeout.ms}} to 
> flush, and thus the flush will never succeed while the source task is 
> rate-limited by the producer memory. This means that we may write multiple 
> hours of data to Kafka and not ever commit source offsets for the connector. 
> When the task is lost due to a worker failure, hours of data will be 
> re-processed that otherwise were successfully written to Kafka.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KAFKA-12226) High-throughput source tasks fail to commit offsets

2021-11-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12226:
--
Fix Version/s: 3.1.0

> High-throughput source tasks fail to commit offsets
> ---
>
> Key: KAFKA-12226
> URL: https://issues.apache.org/jira/browse/KAFKA-12226
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.1.0
>
>
> The current source task thread has the following workflow:
>  # Poll messages from the source task
>  # Queue these messages to the producer and send them to Kafka asynchronously.
>  # Add the message to outstandingMessages, or if a flush is currently active, 
> outstandingMessagesBacklog
>  # When the producer completes the send of a record, remove it from 
> outstandingMessages
> The commit offsets thread has the following workflow:
>  # Wait a flat timeout for outstandingMessages to flush completely
>  # If this times out, add all of the outstandingMessagesBacklog to the 
> outstandingMessages and reset
>  # If it succeeds, commit the source task offsets to the backing store.
>  # Retry the above on a fixed schedule
> If the source task is producing records quickly (faster than the producer can 
> send), then the producer will throttle the task thread by blocking in its 
> {{send}} method, waiting at most {{max.block.ms}} for space in the 
> {{buffer.memory}} to be available. This means that the number of records in 
> {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to 
> the size of the producer memory buffer.
> This amount of data might take more than {{offset.flush.timeout.ms}} to 
> flush, and thus the flush will never succeed while the source task is 
> rate-limited by the producer memory. This means that we may write multiple 
> hours of data to Kafka and not ever commit source offsets for the connector. 
> When the task is lost due to a worker failure, hours of data will be 
> re-processed that otherwise were successfully written to Kafka.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-9887) failed-task-count JMX metric not updated if task fails during startup

2021-10-28 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435681#comment-17435681
 ] 

Randall Hauch commented on KAFKA-9887:
--

Backported to the `3.0` branch, so it will be included in 3.0.1 when it is 
released. Updated the FIX VERSION on this issue.

> failed-task-count JMX metric not updated if task fails during startup
> -
>
> Key: KAFKA-9887
> URL: https://issues.apache.org/jira/browse/KAFKA-9887
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
>Reporter: Chris Egerton
>Assignee: Michael Carter
>Priority: Major
> Fix For: 3.1.0, 2.7.2, 2.8.1, 3.0.1
>
>
> If a task fails on startup (specifically, during [this code 
> section|https://github.com/apache/kafka/blob/00a59b392d92b0d6d3a321ef9a53dae4b3a9d030/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L427-L468]),
>  the {{failed-task-count}} JMX metric is not updated to reflect the task 
> failure, even though the status endpoints in the REST API do report the task 
> as failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9887) failed-task-count JMX metric not updated if task fails during startup

2021-10-28 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9887:
-
Fix Version/s: 3.0.1

> failed-task-count JMX metric not updated if task fails during startup
> -
>
> Key: KAFKA-9887
> URL: https://issues.apache.org/jira/browse/KAFKA-9887
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
>Reporter: Chris Egerton
>Assignee: Michael Carter
>Priority: Major
> Fix For: 3.1.0, 2.7.2, 2.8.1, 3.0.1
>
>
> If a task fails on startup (specifically, during [this code 
> section|https://github.com/apache/kafka/blob/00a59b392d92b0d6d3a321ef9a53dae4b3a9d030/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L427-L468]),
>  the {{failed-task-count}} JMX metric is not updated to reflect the task 
> failure, even though the status endpoints in the REST API do report the task 
> as failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13118) Backport KAFKA-9887 to 3.0 branch after 3.0.0 release

2021-10-28 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-13118.
---
Resolution: Fixed

Backported to the `3.0` branch. Will be in the 3.0.1 release.

> Backport KAFKA-9887 to 3.0 branch after 3.0.0 release
> -
>
> Key: KAFKA-13118
> URL: https://issues.apache.org/jira/browse/KAFKA-13118
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.0.1
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 3.0.1
>
>
> We need to backport the fix (commit hash `0314801a8e`) for KAFKA-9887 to the 
> `3.0` branch. That fix was merged to `trunk`, `2.8`, and `2.7` _after_ the 
> 3.0 code freeze, and that issue is not a blocker or regression.
> Be sure to update the "fix version" on KAFKA-9887 when the backport is 
> complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10865) Improve trace-logging for Transformations (including Predicates)

2021-10-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-10865:
-

Assignee: Tiago Martins  (was: Govinda Sakhare)

> Improve trace-logging for Transformations (including Predicates)
> 
>
> Key: KAFKA-10865
> URL: https://issues.apache.org/jira/browse/KAFKA-10865
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Robin Moffatt
>Assignee: Tiago Martins
>Priority: Major
>  Labels: newbie
>
> I've been spending [a bunch of time poking around 
> SMTs|https://rmoff.net/categories/twelvedaysofsmt/] recently, and one common 
> challenge I've had is being able to debug when things don't behave as I 
> expect.
>   
>  I know that there is the {{TransformationChain}} logger, but this only gives 
> (IIUC) the input record
> {code:java}
> [2020-12-17 09:38:58,057] TRACE [sink-simulator-day12-00|task-0] Applying 
> transformation io.confluent.connect.transforms.Filter$Value to 
> SinkRecord{kafkaOffset=10551, timestampType=CreateTime} 
> ConnectRecord{topic='day12-sys01', kafkaPartition=0, 
> key=2c2ceb9b-8b31-4ade-a757-886ebfb7a398, keySchema=Schema{STRING}, 
> value=Struct{units=16,product=Founders Breakfast 
> Stout,amount=30.41,txn_date=Sat Dec 12 18:21:18 GMT 2020,source=SYS01}, 
> valueSchema=Schema{io.mdrogalis.Gen0:STRUCT}, timestamp=1608197938054, 
> headers=ConnectHeaders(headers=)} 
> (org.apache.kafka.connect.runtime.TransformationChain:47)
> {code}
>  
>  I think it would be really useful to also have trace level logging that 
> included:
>  - the _output_ of *each* transform
>  - the evaluation and result of any `predicate`s
> I have been using 
> {{com.github.jcustenborder.kafka.connect.simulator.SimulatorSinkConnector}} 
> which is really useful for seeing the final record:
> {code:java}
> [2020-12-17 09:38:58,057] INFO [sink-simulator-day12-00|task-0] 
> record.value=Struct{units=16,product=Founders Breakfast 
> Stout,amount=30.41,txn_date=Sat Dec 12 18:21:18 GMT 2020,source=SYS01} 
> (com.github.jcustenborder.kafka.connect.simulator.SimulatorSinkTask:50)
> {code}
>  
>  But doesn't include things like topic name (which is often changed by common 
> SMTs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10865) Improve trace-logging for Transformations (including Predicates)

2021-10-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10865:
--

[~upsidedownsmile], I've assigned this issue to you as requested. You are now 
listed as a contributor on this Jira project, so you can self-assign issues and 
make a few other changes.

[~govi20], I hope that's okay. If not, please respond here.

> Improve trace-logging for Transformations (including Predicates)
> 
>
> Key: KAFKA-10865
> URL: https://issues.apache.org/jira/browse/KAFKA-10865
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Robin Moffatt
>Assignee: Govinda Sakhare
>Priority: Major
>  Labels: newbie
>
> I've been spending [a bunch of time poking around 
> SMTs|https://rmoff.net/categories/twelvedaysofsmt/] recently, and one common 
> challenge I've had is being able to debug when things don't behave as I 
> expect.
>   
>  I know that there is the {{TransformationChain}} logger, but this only gives 
> (IIUC) the input record
> {code:java}
> [2020-12-17 09:38:58,057] TRACE [sink-simulator-day12-00|task-0] Applying 
> transformation io.confluent.connect.transforms.Filter$Value to 
> SinkRecord{kafkaOffset=10551, timestampType=CreateTime} 
> ConnectRecord{topic='day12-sys01', kafkaPartition=0, 
> key=2c2ceb9b-8b31-4ade-a757-886ebfb7a398, keySchema=Schema{STRING}, 
> value=Struct{units=16,product=Founders Breakfast 
> Stout,amount=30.41,txn_date=Sat Dec 12 18:21:18 GMT 2020,source=SYS01}, 
> valueSchema=Schema{io.mdrogalis.Gen0:STRUCT}, timestamp=1608197938054, 
> headers=ConnectHeaders(headers=)} 
> (org.apache.kafka.connect.runtime.TransformationChain:47)
> {code}
>  
>  I think it would be really useful to also have trace level logging that 
> included:
>  - the _output_ of *each* transform
>  - the evaluation and result of any `predicate`s
> I have been using 
> {{com.github.jcustenborder.kafka.connect.simulator.SimulatorSinkConnector}} 
> which is really useful for seeing the final record:
> {code:java}
> [2020-12-17 09:38:58,057] INFO [sink-simulator-day12-00|task-0] 
> record.value=Struct{units=16,product=Founders Breakfast 
> Stout,amount=30.41,txn_date=Sat Dec 12 18:21:18 GMT 2020,source=SYS01} 
> (com.github.jcustenborder.kafka.connect.simulator.SimulatorSinkTask:50)
> {code}
>  
>  But doesn't include things like topic name (which is often changed by common 
> SMTs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-2376) Add Kafka Connect metrics

2021-07-29 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389959#comment-17389959
 ] 

Randall Hauch commented on KAFKA-2376:
--

[~ramkrish1489], it's not uncommon for a KIP to propose metrics that are 
implemented in multiple releases. In this case, I think other things took 
priority.

We'd welcome any contributions, though in this case it'd be through a new KAFKA 
issue that references this issue and/or the KIP.

> Add Kafka Connect metrics
> -
>
> Key: KAFKA-2376
> URL: https://issues.apache.org/jira/browse/KAFKA-2376
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Randall Hauch
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 1.0.0
>
>
> Kafka Connect needs good metrics for monitoring since that will be the 
> primary insight into the health of connectors as they copy data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10627) Connect TimestampConverter transform does not support multiple formats for the same field and only allows one field to be transformed at a time

2021-07-28 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388779#comment-17388779
 ] 

Randall Hauch commented on KAFKA-10627:
---

Thanks, [~joshuagrisham]! I've added you as a contributor to this Jira project, 
and assigned this issue to you since you've created the KIP and a PR.

I'll take a look at the proposed KIP and respond on the KIP discussion thread.

> Connect TimestampConverter transform does not support multiple formats for 
> the same field and only allows one field to be transformed at a time
> ---
>
> Key: KAFKA-10627
> URL: https://issues.apache.org/jira/browse/KAFKA-10627
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Joshua Grisham
>Assignee: Joshua Grisham
>Priority: Minor
>  Labels: connect-transformation, need-kip
>
> Some of the limitations of the *TimestampConverter* transform are causing 
> issues for us since we have a lot of different producers from different 
> systems producing events to some of our topics.  We try our best to have 
> governance on the data formats including strict usage of Avro schemas but 
> there are still variations in timestamp data types that are allowed by the 
> schema.
> In the end there will be multiple formats coming into the same timestamp 
> fields (for example, with and without milliseconds, with and without a 
> timezone specifier, etc).
> And then you get failed events in Connect with messages like this:
> {noformat}
> org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error 
> handler
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperatorror(RetryWithToleranceOperator.java:178)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
>   at 
> org.apache.ntime.TransformationChain.apply(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:514)
>   at 
> org.aect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
>   at org.apache.kafka.corkerSinkTask.iteration(WorkerSinkTask.java:228)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
>   at org.apache.kafka.connect.runtime.WorrkerTask.java:184)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   atrrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$WorolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.connect.errors.DataException: Could not parse 
> timestamp: value (2020-10-06T12:12:27h pattern (-MM-dd'T'HH:mm:ss.SSSX)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampConverter.java:120)
>   at 
> org.apache.kafka.connect.transformrter.convertTimestamp(TimestampConverter.java:450)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.applyValueWithSchema(TimestampConverter.java:375)
>   at 
> org.apachtransforms.TimestampConverter.applyWithSchema(TimestampConverter.java:362)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.apply(TimestampConverter.java:279)
>   at 
> .connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithT.java:128)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
>   ... 14 more
> Caused by: java.text.Unparseable date: \"2020-10-06T12:12:27Z\"
>   at java.text.DateFormat.parse(DateFormat.java:366)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampCo)
>   ... 21 more
> {noformat}
>  
> My thinking is that maybe a good solution is to switch from using 
> *java.util.Date* to instead using *java.util.Time*, then instead of 
> *SimpleDateFormatter* switch to *DateTimeFormatter* which will allow usage of 
> more sophisticated patterns in the config to match multiple different 
> allowable formats.
> For example instead of effectively doing this:
> {code:java}
> SimpleDateFormat format = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSX");{code}
> It can be something like this:
> {code:java}

[jira] [Assigned] (KAFKA-10627) Connect TimestampConverter transform does not support multiple formats for the same field and only allows one field to be transformed at a time

2021-07-28 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-10627:
-

Assignee: Joshua Grisham

> Connect TimestampConverter transform does not support multiple formats for 
> the same field and only allows one field to be transformed at a time
> ---
>
> Key: KAFKA-10627
> URL: https://issues.apache.org/jira/browse/KAFKA-10627
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Joshua Grisham
>Assignee: Joshua Grisham
>Priority: Minor
>  Labels: connect-transformation, need-kip
>
> Some of the limitations of the *TimestampConverter* transform are causing 
> issues for us since we have a lot of different producers from different 
> systems producing events to some of our topics.  We try our best to have 
> governance on the data formats including strict usage of Avro schemas but 
> there are still variations in timestamp data types that are allowed by the 
> schema.
> In the end there will be multiple formats coming into the same timestamp 
> fields (for example, with and without milliseconds, with and without a 
> timezone specifier, etc).
> And then you get failed events in Connect with messages like this:
> {noformat}
> org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error 
> handler
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperatorror(RetryWithToleranceOperator.java:178)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
>   at 
> org.apache.ntime.TransformationChain.apply(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:514)
>   at 
> org.aect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
>   at org.apache.kafka.corkerSinkTask.iteration(WorkerSinkTask.java:228)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
>   at org.apache.kafka.connect.runtime.WorrkerTask.java:184)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   atrrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$WorolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.connect.errors.DataException: Could not parse 
> timestamp: value (2020-10-06T12:12:27h pattern (-MM-dd'T'HH:mm:ss.SSSX)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampConverter.java:120)
>   at 
> org.apache.kafka.connect.transformrter.convertTimestamp(TimestampConverter.java:450)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.applyValueWithSchema(TimestampConverter.java:375)
>   at 
> org.apachtransforms.TimestampConverter.applyWithSchema(TimestampConverter.java:362)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.apply(TimestampConverter.java:279)
>   at 
> .connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithT.java:128)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
>   ... 14 more
> Caused by: java.text.Unparseable date: \"2020-10-06T12:12:27Z\"
>   at java.text.DateFormat.parse(DateFormat.java:366)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampCo)
>   ... 21 more
> {noformat}
>  
> My thinking is that maybe a good solution is to switch from using 
> *java.util.Date* to instead using *java.util.Time*, then instead of 
> *SimpleDateFormatter* switch to *DateTimeFormatter* which will allow usage of 
> more sophisticated patterns in the config to match multiple different 
> allowable formats.
> For example instead of effectively doing this:
> {code:java}
> SimpleDateFormat format = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSX");{code}
> It can be something like this:
> {code:java}
> DateTimeFormatter format = DateTimeFormatter.ofPattern("[-MM-dd[['T'][ 
> ]HH:mm:ss[.SSSz][.SSS[XXX][X");{code}
> Also if there are multiple timestamp fields in the schema/events, then today 
> you have to chain multiple 

[jira] [Updated] (KAFKA-10627) Connect TimestampConverter transform does not support multiple formats for the same field and only allows one field to be transformed at a time

2021-07-28 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10627:
--
Issue Type: New Feature  (was: Improvement)

> Connect TimestampConverter transform does not support multiple formats for 
> the same field and only allows one field to be transformed at a time
> ---
>
> Key: KAFKA-10627
> URL: https://issues.apache.org/jira/browse/KAFKA-10627
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Joshua Grisham
>Priority: Minor
>  Labels: connect-transformation, need-kip
>
> Some of the limitations of the *TimestampConverter* transform are causing 
> issues for us since we have a lot of different producers from different 
> systems producing events to some of our topics.  We try our best to have 
> governance on the data formats including strict usage of Avro schemas but 
> there are still variations in timestamp data types that are allowed by the 
> schema.
> In the end there will be multiple formats coming into the same timestamp 
> fields (for example, with and without milliseconds, with and without a 
> timezone specifier, etc).
> And then you get failed events in Connect with messages like this:
> {noformat}
> org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error 
> handler
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperatorror(RetryWithToleranceOperator.java:178)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
>   at 
> org.apache.ntime.TransformationChain.apply(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:514)
>   at 
> org.aect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
>   at org.apache.kafka.corkerSinkTask.iteration(WorkerSinkTask.java:228)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
>   at org.apache.kafka.connect.runtime.WorrkerTask.java:184)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   atrrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$WorolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.connect.errors.DataException: Could not parse 
> timestamp: value (2020-10-06T12:12:27h pattern (-MM-dd'T'HH:mm:ss.SSSX)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampConverter.java:120)
>   at 
> org.apache.kafka.connect.transformrter.convertTimestamp(TimestampConverter.java:450)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.applyValueWithSchema(TimestampConverter.java:375)
>   at 
> org.apachtransforms.TimestampConverter.applyWithSchema(TimestampConverter.java:362)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.apply(TimestampConverter.java:279)
>   at 
> .connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithT.java:128)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
>   ... 14 more
> Caused by: java.text.Unparseable date: \"2020-10-06T12:12:27Z\"
>   at java.text.DateFormat.parse(DateFormat.java:366)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampCo)
>   ... 21 more
> {noformat}
>  
> My thinking is that maybe a good solution is to switch from using 
> *java.util.Date* to instead using *java.util.Time*, then instead of 
> *SimpleDateFormatter* switch to *DateTimeFormatter* which will allow usage of 
> more sophisticated patterns in the config to match multiple different 
> allowable formats.
> For example instead of effectively doing this:
> {code:java}
> SimpleDateFormat format = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSX");{code}
> It can be something like this:
> {code:java}
> DateTimeFormatter format = DateTimeFormatter.ofPattern("[-MM-dd[['T'][ 
> ]HH:mm:ss[.SSSz][.SSS[XXX][X");{code}
> Also if there are multiple timestamp fields in the schema/events, then today 
> you have to chain multiple *TimestampConverter* 

[jira] [Updated] (KAFKA-10627) Connect TimestampConverter transform does not support multiple formats for the same field and only allows one field to be transformed at a time

2021-07-28 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10627:
--
Labels: connect-transformation need-kip  (was: )

> Connect TimestampConverter transform does not support multiple formats for 
> the same field and only allows one field to be transformed at a time
> ---
>
> Key: KAFKA-10627
> URL: https://issues.apache.org/jira/browse/KAFKA-10627
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Joshua Grisham
>Priority: Minor
>  Labels: connect-transformation, need-kip
>
> Some of the limitations of the *TimestampConverter* transform are causing 
> issues for us since we have a lot of different producers from different 
> systems producing events to some of our topics.  We try our best to have 
> governance on the data formats including strict usage of Avro schemas but 
> there are still variations in timestamp data types that are allowed by the 
> schema.
> In the end there will be multiple formats coming into the same timestamp 
> fields (for example, with and without milliseconds, with and without a 
> timezone specifier, etc).
> And then you get failed events in Connect with messages like this:
> {noformat}
> org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error 
> handler
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperatorror(RetryWithToleranceOperator.java:178)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
>   at 
> org.apache.ntime.TransformationChain.apply(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:514)
>   at 
> org.aect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
>   at org.apache.kafka.corkerSinkTask.iteration(WorkerSinkTask.java:228)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
>   at org.apache.kafka.connect.runtime.WorrkerTask.java:184)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   atrrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$WorolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.connect.errors.DataException: Could not parse 
> timestamp: value (2020-10-06T12:12:27h pattern (-MM-dd'T'HH:mm:ss.SSSX)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampConverter.java:120)
>   at 
> org.apache.kafka.connect.transformrter.convertTimestamp(TimestampConverter.java:450)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.applyValueWithSchema(TimestampConverter.java:375)
>   at 
> org.apachtransforms.TimestampConverter.applyWithSchema(TimestampConverter.java:362)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter.apply(TimestampConverter.java:279)
>   at 
> .connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithT.java:128)
>   at 
> org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
>   ... 14 more
> Caused by: java.text.Unparseable date: \"2020-10-06T12:12:27Z\"
>   at java.text.DateFormat.parse(DateFormat.java:366)
>   at 
> org.apache.kafka.connect.transforms.TimestampConverter$1.toRaw(TimestampCo)
>   ... 21 more
> {noformat}
>  
> My thinking is that maybe a good solution is to switch from using 
> *java.util.Date* to instead using *java.util.Time*, then instead of 
> *SimpleDateFormatter* switch to *DateTimeFormatter* which will allow usage of 
> more sophisticated patterns in the config to match multiple different 
> allowable formats.
> For example instead of effectively doing this:
> {code:java}
> SimpleDateFormat format = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSX");{code}
> It can be something like this:
> {code:java}
> DateTimeFormatter format = DateTimeFormatter.ofPattern("[-MM-dd[['T'][ 
> ]HH:mm:ss[.SSSz][.SSS[XXX][X");{code}
> Also if there are multiple timestamp fields in the schema/events, then today 
> you have to chain multiple *TimestampConverter* 

[jira] [Commented] (KAFKA-13139) Empty response after requesting to restart a connector without the tasks results in NPE

2021-07-27 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388116#comment-17388116
 ] 

Randall Hauch commented on KAFKA-13139:
---

Just to clarify what appears to have happened.

As [~kpatelatwork] [mentions in a comment on the 
PR|https://github.com/apache/kafka/pull/11132], the behavior of the Connect 
restart API in AK 2.8 and earlier was always to return "204 NO CONTENT", not 
"200 OK" as mentioned in 
[KIP-745|https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks].
 Although the code used `Response.ok().build()`, the `RestClient` always 
processed the absence of a response body as `204 NO CONTENT`. So, to maintain 
the actual AK 2.x behavior in this branch of the code, we should instead return 
`204 NO CONTENT`.

I've corrected the KIP to reflect this older actual behavior of returning "204 
NO CONTENT". It was a minor but necessary correction.

Note that we have *not* changed the KIP or the behavior of returning "202 
ACCEPTED" when `includeTasks=true` and/or `failedOnly=true`. These cases 
correspond to the new behavior added in KIP-745.

> Empty response after requesting to restart a connector without the tasks 
> results in NPE
> ---
>
> Key: KAFKA-13139
> URL: https://issues.apache.org/jira/browse/KAFKA-13139
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 3.0.0
>
>
> After https://issues.apache.org/jira/browse/KAFKA-4793 a response to restart 
> only the connector (without any tasks) returns OK with an empty body. 
> As system test runs revealed, this causes an NPE in 
> [https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/RestClient.java#L135]
> We should return 204 (NO_CONTENT) instead. 
> This is a regression from previous behavior, therefore the ticket is marked 
> as a blocker candidate for 3.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-13118) Backport KAFKA-9887 to 3.0 branch after 3.0.0 release

2021-07-21 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-13118:
--
Description: 
We need to backport the fix (commit hash `0314801a8e`) for KAFKA-9887 to the 
`3.0` branch. That fix was merged to `trunk`, `2.8`, and `2.7` _after_ the 3.0 
code freeze, and that issue is not a blocker or regression.

Be sure to update the "fix version" on KAFKA-9887 when the backport is complete.

  was:We need to backport the fix (commit hash `0314801a8e`) for KAFKA-9887 to 
the `3.0` branch. That fix was merged to `trunk`, `2.8`, and `2.7` _after_ the 
3.0 code freeze, and that issue is not a blocker or regression.


> Backport KAFKA-9887 to 3.0 branch after 3.0.0 release
> -
>
> Key: KAFKA-13118
> URL: https://issues.apache.org/jira/browse/KAFKA-13118
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Affects Versions: 3.0.1
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 3.0.1
>
>
> We need to backport the fix (commit hash `0314801a8e`) for KAFKA-9887 to the 
> `3.0` branch. That fix was merged to `trunk`, `2.8`, and `2.7` _after_ the 
> 3.0 code freeze, and that issue is not a blocker or regression.
> Be sure to update the "fix version" on KAFKA-9887 when the backport is 
> complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13118) Backport KAFKA-9887 to 3.0 branch after 3.0.0 release

2021-07-21 Thread Randall Hauch (Jira)
Randall Hauch created KAFKA-13118:
-

 Summary: Backport KAFKA-9887 to 3.0 branch after 3.0.0 release
 Key: KAFKA-13118
 URL: https://issues.apache.org/jira/browse/KAFKA-13118
 Project: Kafka
  Issue Type: Task
  Components: KafkaConnect
Affects Versions: 3.0.1
Reporter: Randall Hauch
Assignee: Randall Hauch
 Fix For: 3.0.1


We need to backport the fix (commit hash `0314801a8e`) for KAFKA-9887 to the 
`3.0` branch. That fix was merged to `trunk`, `2.8`, and `2.7` _after_ the 3.0 
code freeze, and that issue is not a blocker or regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9887) failed-task-count JMX metric not updated if task fails during startup

2021-07-21 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385005#comment-17385005
 ] 

Randall Hauch commented on KAFKA-9887:
--

Unfortunately, the 3.0 code freeze was two weeks ago, so I will create a new 
issue as a blocker for 3.0.1 to backport this fix to the `3.0` branch after the 
3.0.0 release is complete, and will like to this issue.

> failed-task-count JMX metric not updated if task fails during startup
> -
>
> Key: KAFKA-9887
> URL: https://issues.apache.org/jira/browse/KAFKA-9887
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
>Reporter: Chris Egerton
>Assignee: Michael Carter
>Priority: Major
> Fix For: 3.1.0, 2.7.2, 2.8.1
>
>
> If a task fails on startup (specifically, during [this code 
> section|https://github.com/apache/kafka/blob/00a59b392d92b0d6d3a321ef9a53dae4b3a9d030/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L427-L468]),
>  the {{failed-task-count}} JMX metric is not updated to reflect the task 
> failure, even though the status endpoints in the REST API do report the task 
> as failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13035) Kafka Connect: Update documentation for POST /connectors/(string: name)/restart to include task Restart behavior

2021-07-06 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-13035.
---
Fix Version/s: 3.0.0
 Reviewer: Randall Hauch
   Resolution: Fixed

Merged to `trunk` in time for 3.0.0

> Kafka Connect: Update documentation for POST /connectors/(string: 
> name)/restart to include task Restart behavior  
> --
>
> Key: KAFKA-13035
> URL: https://issues.apache.org/jira/browse/KAFKA-13035
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Kalpesh Patel
>Assignee: Kalpesh Patel
>Priority: Minor
> Fix For: 3.0.0
>
>
> KAFKA-4793 updated the behavior of POST /connectors/(string: name)/restart 
> based on queryParameters onlyFailed and includeTasks  based on 
> [KIP-475|https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks].
>  We should update documentation to reflect this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-13028) AbstractConfig should allow config provider configuration to use variables referencing other config providers earlier in the list

2021-07-02 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-13028:
-

Assignee: Randall Hauch

> AbstractConfig should allow config provider configuration to use variables 
> referencing other config providers earlier in the list
> -
>
> Key: KAFKA-13028
> URL: https://issues.apache.org/jira/browse/KAFKA-13028
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, KafkaConnect
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>
> When AbstractConfig recognizes config provider properties, it instantiates 
> all of the config providers first and then uses those config providers to 
> resolve any variables in remaining configurations. This means that if you 
> define two config providers with:
> {code}
> config.providers=providerA,providerB
> ...
> {code}
> then the configuration properties for the second provider (e.g., `providerB`) 
> cannot use variables that reference the first provider (e.g., `providerA`). 
> In other words, this is not possible:
> {code}
> config.providers=providerA,providerB
> config.providers.providerA.class=FileConfigProvider
> config.providers.providerB.class=ComplexConfigProvider
> config.providers.providerA.param.client.key=${file:/usr/secrets:complex.client.key}
> config.providers.providerA.param.client.secret=${file:/usr/secrets:complex.client.secret}
> {code}
> This should be possible if the config providers are instantiated and 
> configured in the same order as they appear in the `config.providers` 
> property. The benefit is that it allows another level of indirection so that 
> any secrets required by config provider can be resolved using an earlier 
> simple config provider.
> For example, config providers are often defined in Connect worker 
> configurations to resolve secrets within connector configurations, or to 
> resolve secrets within the worker configuration itself (e.g., producer or 
> consumer secrets). But it would be useful to also be able to resolve the 
> secrets needed by one configuration provider using another configuration 
> provider that is defined earlier in the list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-13028) AbstractConfig should allow config provider configuration to use variables referencing other config providers earlier in the list

2021-07-02 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373715#comment-17373715
 ] 

Randall Hauch commented on KAFKA-13028:
---

h2. Is a KIP required?
[KIP-421|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100829515]
 defines the current behavior, and it's not clear to me whether this new 
behavior requires a KIP. KIP-421 says:
{quote}
The KIP proposes on using the  existing AbstractConfig to automatically resolve 
indirect variables. The {{originals}} configurations will contain both the 
config provider configs as well as configuration properties. The constructor 
will first instantiate the ConfigProviders using the config provider configs, 
then it will find all the variables in the values of the {{originals}} 
configurations, attempt to resolve the variables using the named 
ConfigProviders, and then do the normal parsing and validation of the 
configurations.
{quote}

The KIP doesn't say _how_ it will instantiate and configure the ConfigProvider 
objects, just that it will do so. It does not say that variables used in the 
configuration properties for the ConfigProvider may or may not use variables, 
or even the order in which the ConfigProviders will be instantiated and 
configured. 

On one hand, the AbstractConfig constructor could instantiate each 
ConfigProvider and resolve any variables using any ConfigProvider objects it 
had previously instantiated. One could argue that this new behavior is simply 
using the same variable resolution logic just more consistently throughout the 
constructor.

On the other hand, even though this would only alter the behavior, it may be 
worthwhile to clarify this different behavior in a KIP.

> AbstractConfig should allow config provider configuration to use variables 
> referencing other config providers earlier in the list
> -
>
> Key: KAFKA-13028
> URL: https://issues.apache.org/jira/browse/KAFKA-13028
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, KafkaConnect
>Reporter: Randall Hauch
>Priority: Major
>
> When AbstractConfig recognizes config provider properties, it instantiates 
> all of the config providers first and then uses those config providers to 
> resolve any variables in remaining configurations. This means that if you 
> define two config providers with:
> {code}
> config.providers=providerA,providerB
> ...
> {code}
> then the configuration properties for the second provider (e.g., `providerB`) 
> cannot use variables that reference the first provider (e.g., `providerA`). 
> In other words, this is not possible:
> {code}
> config.providers=providerA,providerB
> config.providers.providerA.class=FileConfigProvider
> config.providers.providerB.class=ComplexConfigProvider
> config.providers.providerA.param.client.key=${file:/usr/secrets:complex.client.key}
> config.providers.providerA.param.client.secret=${file:/usr/secrets:complex.client.secret}
> {code}
> This should be possible if the config providers are instantiated and 
> configured in the same order as they appear in the `config.providers` 
> property. The benefit is that it allows another level of indirection so that 
> any secrets required by config provider can be resolved using an earlier 
> simple config provider.
> For example, config providers are often defined in Connect worker 
> configurations to resolve secrets within connector configurations, or to 
> resolve secrets within the worker configuration itself (e.g., producer or 
> consumer secrets). But it would be useful to also be able to resolve the 
> secrets needed by one configuration provider using another configuration 
> provider that is defined earlier in the list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13028) AbstractConfig should allow config provider configuration to use variables referencing other config providers earlier in the list

2021-07-02 Thread Randall Hauch (Jira)
Randall Hauch created KAFKA-13028:
-

 Summary: AbstractConfig should allow config provider configuration 
to use variables referencing other config providers earlier in the list
 Key: KAFKA-13028
 URL: https://issues.apache.org/jira/browse/KAFKA-13028
 Project: Kafka
  Issue Type: Improvement
  Components: clients, KafkaConnect
Reporter: Randall Hauch


When AbstractConfig recognizes config provider properties, it instantiates all 
of the config providers first and then uses those config providers to resolve 
any variables in remaining configurations. This means that if you define two 
config providers with:

{code}
config.providers=providerA,providerB
...
{code}
then the configuration properties for the second provider (e.g., `providerB`) 
cannot use variables that reference the first provider (e.g., `providerA`). In 
other words, this is not possible:

{code}
config.providers=providerA,providerB
config.providers.providerA.class=FileConfigProvider
config.providers.providerB.class=ComplexConfigProvider
config.providers.providerA.param.client.key=${file:/usr/secrets:complex.client.key}
config.providers.providerA.param.client.secret=${file:/usr/secrets:complex.client.secret}
{code}

This should be possible if the config providers are instantiated and configured 
in the same order as they appear in the `config.providers` property. The 
benefit is that it allows another level of indirection so that any secrets 
required by config provider can be resolved using an earlier simple config 
provider.

For example, config providers are often defined in Connect worker 
configurations to resolve secrets within connector configurations, or to 
resolve secrets within the worker configuration itself (e.g., producer or 
consumer secrets). But it would be useful to also be able to resolve the 
secrets needed by one configuration provider using another configuration 
provider that is defined earlier in the list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12717) Remove internal converter config properties

2021-07-01 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-12717.
---
  Reviewer: Randall Hauch
Resolution: Fixed

Merged to the `trunk` branch for 3.0.0.

> Remove internal converter config properties
> ---
>
> Key: KAFKA-12717
> URL: https://issues.apache.org/jira/browse/KAFKA-12717
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> KAFKA-5540 / 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig]
>  deprecated but did not officially remove Connect's internal converter worker 
> config properties. With the upcoming 3.0 release, we can make the 
> backwards-incompatible change of completely removing these properties once 
> and for all.
>  
> One migration path for users who may still be running Connect clusters with 
> different internal converters can be:
>  # Stop all workers on the cluster
>  # For each internal topic (config, offsets, and status):
>  ## Create a new topic to take the place of the existing one
>  ## For every message in the existing topic:
>  ### Deserialize the message's key and value using the Connect cluster's old 
> internal key and value converters
>  ### Serialize the message's key and value using the [JSON 
> converter|https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java]
>  with schemas disabled (by setting the {{schemas.enable}} property to 
> {{false}})
>  ### Write a message with the new key and value to the new internal topic
>  # Reconfigure each Connect worker to use the newly-created internal topics 
> from step 2
>  # Start all workers on the cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12482) Remove deprecated rest.host.name and rest.port Connect worker configs

2021-06-23 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-12482.
---
  Reviewer: Randall Hauch
Resolution: Fixed

Merged to `trunk`.

> Remove deprecated rest.host.name and rest.port Connect worker configs
> -
>
> Key: KAFKA-12482
> URL: https://issues.apache.org/jira/browse/KAFKA-12482
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Assignee: Kalpesh Patel
>Priority: Critical
> Fix For: 3.0.0
>
>
> The following Connect worker configuration properties were deprecated and 
> should be removed in 3.0.0:
>  * {{rest.host.name}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{rest.port}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
> See KAFKA-12717 for removing the internal converter configurations:
>  * {{internal.key.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
>  * {{internal.value.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12484) Enable Connect's connector log contexts by default

2021-06-22 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-12484.
---
  Reviewer: Konstantine Karantasis
Resolution: Fixed

Merged to the `trunk` branch for inclusion in 3.0.0.

> Enable Connect's connector log contexts by default
> --
>
> Key: KAFKA-12484
> URL: https://issues.apache.org/jira/browse/KAFKA-12484
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> Connect's Log4J configuration does not by default log the connector contexts. 
> That feature was added in 
> [KIP-449|https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs]
>  and first appeared in AK 2.3.0, but it was not enabled by default since that 
> would not have been backward compatible.
> But with AK 3.0.0, we have the opportunity to change the default in 
> {{config/connect-log4j.properties}} to enable connector log contexts.
> See 
> [KIP-721|https://cwiki.apache.org/confluence/display/KAFKA/KIP-721%3A+Enable+connector+log+contexts+in+Connect+Log4j+configuration].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12483) Enable client overrides in connector configs by default

2021-06-22 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-12483.
---
Resolution: Fixed

Merged to the `trunk` branch for inclusion in 3.0.0.

> Enable client overrides in connector configs by default
> ---
>
> Key: KAFKA-12483
> URL: https://issues.apache.org/jira/browse/KAFKA-12483
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> Connector-specific client overrides were added in 
> [KIP-458|https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy],
>  but that feature is not enabled by default since it would not have been 
> backward compatible.
> But with AK 3.0.0, we have the opportunity to enable connector client 
> overrides by default by changing the worker config's 
> {{connector.client.config.override.policy}} default value to {{All}}.
> See 
> [KIP-722|https://cwiki.apache.org/confluence/display/KAFKA/KIP-722%3A+Enable+connector+client+overrides+by+default].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12252) Distributed herder tick thread loops rapidly when worker loses leadership

2021-06-18 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365610#comment-17365610
 ] 

Randall Hauch commented on KAFKA-12252:
---

Backported to 2.6 for inclusion in any subsequent 2.6.3 patch release.

> Distributed herder tick thread loops rapidly when worker loses leadership
> -
>
> Key: KAFKA-12252
> URL: https://issues.apache.org/jira/browse/KAFKA-12252
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.6.3, 2.7.2, 2.8.1
>
>
> When a new session key is read from the config topic, if the worker is the 
> leader, it [schedules a new key 
> rotation|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1579-L1581].
>  The time between key rotations is configurable but defaults to an hour.
> The herder then continues its tick loop, which usually ends with a long poll 
> for rebalance activity. However, when a key rotation is scheduled, it will 
> [limit the time spent 
> polling|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L384-L388]
>  at the end of the tick loop in order to be able to perform the rotation.
> Once woken up, the worker checks to see if a key rotation is necessary and, 
> if so, [sets the expected key rotation time to 
> Long.MAX_VALUE|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L344],
>  then [writes a new session key to the config 
> topic|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L345-L348].
>  The problem is, [the worker only ever decides a key rotation is necessary if 
> it is still the 
> leader|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L456-L469].
>  If the worker is no longer the leader at the time of the key rotation 
> (likely due to falling out of the cluster after losing contact with the group 
> coordinator), its key expiration time won’t be reset, and the long poll for 
> rebalance activity at the end of the tick loop will be given a timeout of 0 
> ms and result in the tick loop being immediately restarted. Even if the 
> worker reads a new session key from the config topic, it’ll continue looping 
> like this since its scheduled key rotation won’t be updated. At this point, 
> the only thing that would help the worker get back into a healthy state would 
> be if it were made the leader of the cluster again.
> One possible fix could be to add a conditional check in the tick thread to 
> only limit the time spent on rebalance polling if the worker is currently the 
> leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12252) Distributed herder tick thread loops rapidly when worker loses leadership

2021-06-18 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12252:
--
Fix Version/s: 2.6.3

> Distributed herder tick thread loops rapidly when worker loses leadership
> -
>
> Key: KAFKA-12252
> URL: https://issues.apache.org/jira/browse/KAFKA-12252
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.6.3, 2.7.2, 2.8.1
>
>
> When a new session key is read from the config topic, if the worker is the 
> leader, it [schedules a new key 
> rotation|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1579-L1581].
>  The time between key rotations is configurable but defaults to an hour.
> The herder then continues its tick loop, which usually ends with a long poll 
> for rebalance activity. However, when a key rotation is scheduled, it will 
> [limit the time spent 
> polling|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L384-L388]
>  at the end of the tick loop in order to be able to perform the rotation.
> Once woken up, the worker checks to see if a key rotation is necessary and, 
> if so, [sets the expected key rotation time to 
> Long.MAX_VALUE|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L344],
>  then [writes a new session key to the config 
> topic|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L345-L348].
>  The problem is, [the worker only ever decides a key rotation is necessary if 
> it is still the 
> leader|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L456-L469].
>  If the worker is no longer the leader at the time of the key rotation 
> (likely due to falling out of the cluster after losing contact with the group 
> coordinator), its key expiration time won’t be reset, and the long poll for 
> rebalance activity at the end of the tick loop will be given a timeout of 0 
> ms and result in the tick loop being immediately restarted. Even if the 
> worker reads a new session key from the config topic, it’ll continue looping 
> like this since its scheduled key rotation won’t be updated. At this point, 
> the only thing that would help the worker get back into a healthy state would 
> be if it were made the leader of the cluster again.
> One possible fix could be to add a conditional check in the tick thread to 
> only limit the time spent on rebalance polling if the worker is currently the 
> leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12252) Distributed herder tick thread loops rapidly when worker loses leadership

2021-06-18 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12252:
--
Fix Version/s: 2.7.2

Backported to 2.7 for inclusion in any subsequent 2.7.2 patch release.

> Distributed herder tick thread loops rapidly when worker loses leadership
> -
>
> Key: KAFKA-12252
> URL: https://issues.apache.org/jira/browse/KAFKA-12252
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.7.2, 2.8.1
>
>
> When a new session key is read from the config topic, if the worker is the 
> leader, it [schedules a new key 
> rotation|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1579-L1581].
>  The time between key rotations is configurable but defaults to an hour.
> The herder then continues its tick loop, which usually ends with a long poll 
> for rebalance activity. However, when a key rotation is scheduled, it will 
> [limit the time spent 
> polling|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L384-L388]
>  at the end of the tick loop in order to be able to perform the rotation.
> Once woken up, the worker checks to see if a key rotation is necessary and, 
> if so, [sets the expected key rotation time to 
> Long.MAX_VALUE|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L344],
>  then [writes a new session key to the config 
> topic|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L345-L348].
>  The problem is, [the worker only ever decides a key rotation is necessary if 
> it is still the 
> leader|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L456-L469].
>  If the worker is no longer the leader at the time of the key rotation 
> (likely due to falling out of the cluster after losing contact with the group 
> coordinator), its key expiration time won’t be reset, and the long poll for 
> rebalance activity at the end of the tick loop will be given a timeout of 0 
> ms and result in the tick loop being immediately restarted. Even if the 
> worker reads a new session key from the config topic, it’ll continue looping 
> like this since its scheduled key rotation won’t be updated. At this point, 
> the only thing that would help the worker get back into a healthy state would 
> be if it were made the leader of the cluster again.
> One possible fix could be to add a conditional check in the tick thread to 
> only limit the time spent on rebalance polling if the worker is currently the 
> leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12483) Enable client overrides in connector configs by default

2021-06-07 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358864#comment-17358864
 ] 

Randall Hauch commented on KAFKA-12483:
---

KIP-722 was approved today.

> Enable client overrides in connector configs by default
> ---
>
> Key: KAFKA-12483
> URL: https://issues.apache.org/jira/browse/KAFKA-12483
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> Connector-specific client overrides were added in 
> [KIP-458|https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy],
>  but that feature is not enabled by default since it would not have been 
> backward compatible.
> But with AK 3.0.0, we have the opportunity to enable connector client 
> overrides by default by changing the worker config's 
> {{connector.client.config.override.policy}} default value to {{All}}.
> See 
> [KIP-722|https://cwiki.apache.org/confluence/display/KAFKA/KIP-722%3A+Enable+connector+client+overrides+by+default].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12484) Enable Connect's connector log contexts by default

2021-06-07 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358860#comment-17358860
 ] 

Randall Hauch commented on KAFKA-12484:
---

KIP-721 was approved today.

> Enable Connect's connector log contexts by default
> --
>
> Key: KAFKA-12484
> URL: https://issues.apache.org/jira/browse/KAFKA-12484
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> Connect's Log4J configuration does not by default log the connector contexts. 
> That feature was added in 
> [KIP-449|https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs]
>  and first appeared in AK 2.3.0, but it was not enabled by default since that 
> would not have been backward compatible.
> But with AK 3.0.0, we have the opportunity to change the default in 
> {{config/connect-log4j.properties}} to enable connector log contexts.
> See 
> [KIP-721|https://cwiki.apache.org/confluence/display/KAFKA/KIP-721%3A+Enable+connector+log+contexts+in+Connect+Log4j+configuration].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12904) Connect's validate REST endpoint uses incorrect timeout

2021-06-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12904:
--
Description: 
The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:

{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
{code}

but should be:
{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
{code}

Users may run into this whenever validating a connector configuration where the 
connector implementation takes more than the 90 seconds to actually validate 
the configuration. 
* Without this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
REST requests might **_not_** return `500 Internal Server Error` and may block 
(the request thread) for a long period of time. 
* With this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
REST requests might **_not_** return `500 Internal Server Error` if the 
connector does not complete the validation of a connector configuration within 
90 seconds.

The user will not see a difference between the behavior before or after this 
fix if/when the connectors complete validation of connector configurations 
before 90 seconds, since the method will return those results to the client.


  was:
The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:

{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
{code}

but should be:
{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
{code}

Users may run into this whenever validating a connector configuration where the 
connector implementation takes more than the 90 seconds to actually validate 
the configuration. 
* Without this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
REST requests might **_not_** return `500 Internal Server Error` and may block 
(the request thread) for a long period of time. 
* With this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
REST requests might **_not_** return `500 Internal Server Error` if the 
connector does not complete the validation of a connector configuration within 
90 seconds.



> Connect's validate REST endpoint uses incorrect timeout
> ---
>
> Key: KAFKA-12904
> URL: https://issues.apache.org/jira/browse/KAFKA-12904
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>
> The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
> method to validate connector configurations used the 
> `ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
> doing so it introduced a bug where the timeout was actually 1000x longer than 
> desired/specified.
> In particular, the following line is currently:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.SECONDS);
> {code}
> but should be:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.MILLISECONDS);
> {code}
> Users may run into this whenever validating a connector configuration where 
> the connector implementation takes more than the 90 seconds to actually 
> validate the configuration. 
> * Without this fix, the `PUT 
> /connector-plugins/(string:name)/config/validate` REST requests might 
> **_not_** return `500 Internal Server Error` and may block (the request 
> thread) for a long period of time. 
> * With this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
> REST requests might **_not_** return `500 Internal Server Error` if the 
> connector does not complete the validation of a connector configuration 
> within 90 seconds.
> The user will not see a difference between the behavior before or after this 
> fix if/when the connectors complete validation of connector configurations 
> before 90 seconds, since the method will return those results to the client.



--
This message was sent by Atlassian Jira

[jira] [Updated] (KAFKA-12904) Connect's validate REST endpoint uses incorrect timeout

2021-06-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12904:
--
Description: 
The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:

{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
{code}

but should be:
{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
{code}

Users may run into this whenever validating a connector configuration where the 
connector implementation takes more than the 90 seconds to actually validate 
the configuration. 
* Without this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
REST requests might **_not_** return `500 Internal Server Error` and may block 
(the request thread) for a long period of time. 
* With this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
REST requests might **_not_** return `500 Internal Server Error` if the 
connector does not complete the validation of a connector configuration within 
90 seconds.


  was:
The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:

{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
{code}

but should be:
{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
{code}



> Connect's validate REST endpoint uses incorrect timeout
> ---
>
> Key: KAFKA-12904
> URL: https://issues.apache.org/jira/browse/KAFKA-12904
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>
> The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
> method to validate connector configurations used the 
> `ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
> doing so it introduced a bug where the timeout was actually 1000x longer than 
> desired/specified.
> In particular, the following line is currently:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.SECONDS);
> {code}
> but should be:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.MILLISECONDS);
> {code}
> Users may run into this whenever validating a connector configuration where 
> the connector implementation takes more than the 90 seconds to actually 
> validate the configuration. 
> * Without this fix, the `PUT 
> /connector-plugins/(string:name)/config/validate` REST requests might 
> **_not_** return `500 Internal Server Error` and may block (the request 
> thread) for a long period of time. 
> * With this fix, the `PUT /connector-plugins/(string:name)/config/validate` 
> REST requests might **_not_** return `500 Internal Server Error` if the 
> connector does not complete the validation of a connector configuration 
> within 90 seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9374) Worker can be disabled by blocked connectors

2021-06-07 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358713#comment-17358713
 ] 

Randall Hauch commented on KAFKA-9374:
--

Please ignore link to [GitHub Pull Request 
#10833|https://github.com/apache/kafka/pull/10833]. I created it with the wrong 
KAFKA issue number.

> Worker can be disabled by blocked connectors
> 
>
> Key: KAFKA-9374
> URL: https://issues.apache.org/jira/browse/KAFKA-9374
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 
> 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 2.6.0
>
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, 
> \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} 
> methods, the worker will be disabled for some types of requests thereafter, 
> including connector creation, connector reconfiguration, and connector 
> deletion.
>  -This only occurs in distributed mode and is due to the threading model used 
> by the 
> [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
>  class.- This affects both distributed and standalone mode. Distributed 
> herders perform some connector work synchronously in their {{tick}} thread, 
> which also handles group membership and some REST requests. The majority of 
> the herder methods for the standalone herder are {{synchronized}}, including 
> those for creating, updating, and deleting connectors; as long as one of 
> those methods blocks, all subsequent calls to any of these methods will also 
> be blocked.
>  
> One potential solution could be to treat connectors that fail to start, stop, 
> etc. in time similarly to tasks that fail to stop within the [task graceful 
> shutdown timeout 
> period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
>  by handling all connector interactions on a separate thread, waiting for 
> them to complete within a timeout, and abandoning the thread (and 
> transitioning the connector to the {{FAILED}} state, if it has been created 
> at all) if that timeout expires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-12904) Connect's validate REST endpoint uses incorrect timeout

2021-06-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-12904:
-

Assignee: Randall Hauch

> Connect's validate REST endpoint uses incorrect timeout
> ---
>
> Key: KAFKA-12904
> URL: https://issues.apache.org/jira/browse/KAFKA-12904
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Major
>
> The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
> method to validate connector configurations used the 
> `ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
> doing so it introduced a bug where the timeout was actually 1000x longer than 
> desired/specified.
> In particular, the following line is currently:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.SECONDS);
> {code}
> but should be:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.MILLISECONDS);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12904) Connect's validate REST endpoint uses incorrect timeout

2021-06-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12904:
--
Description: 
The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:

{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
{code}

but should be:
{code:java}
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
{code}


  was:
The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:
```
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
```
but should be:
```
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
```



> Connect's validate REST endpoint uses incorrect timeout
> ---
>
> Key: KAFKA-12904
> URL: https://issues.apache.org/jira/browse/KAFKA-12904
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.0
>Reporter: Randall Hauch
>Priority: Major
>
> The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
> method to validate connector configurations used the 
> `ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
> doing so it introduced a bug where the timeout was actually 1000x longer than 
> desired/specified.
> In particular, the following line is currently:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.SECONDS);
> {code}
> but should be:
> {code:java}
> return 
> validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
> TimeUnit.MILLISECONDS);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12904) Connect's validate REST endpoint uses incorrect timeout

2021-06-07 Thread Randall Hauch (Jira)
Randall Hauch created KAFKA-12904:
-

 Summary: Connect's validate REST endpoint uses incorrect timeout
 Key: KAFKA-12904
 URL: https://issues.apache.org/jira/browse/KAFKA-12904
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.6.0
Reporter: Randall Hauch


The fix for KAFKA-9374 changed how the `ConnectorPluginsResource` and its 
method to validate connector configurations used the 
`ConnectorsResource.REQUEST_TIMEOUT_MS` constant (90 seconds). However, in 
doing so it introduced a bug where the timeout was actually 1000x longer than 
desired/specified.

In particular, the following line is currently:
```
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, TimeUnit.SECONDS);
```
but should be:
```
return 
validationCallback.get(ConnectorsResource.REQUEST_TIMEOUT_MS, 
TimeUnit.MILLISECONDS);
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-12482) Remove deprecated rest.host.name and rest.port Connect worker configs

2021-06-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-12482:
-

Assignee: Kalpesh Patel

> Remove deprecated rest.host.name and rest.port Connect worker configs
> -
>
> Key: KAFKA-12482
> URL: https://issues.apache.org/jira/browse/KAFKA-12482
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Assignee: Kalpesh Patel
>Priority: Critical
> Fix For: 3.0.0
>
>
> The following Connect worker configuration properties were deprecated and 
> should be removed in 3.0.0:
>  * {{rest.host.name}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{rest.port}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
> See KAFKA-12717 for removing the internal converter configurations:
>  * {{internal.key.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
>  * {{internal.value.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12482) Remove deprecated rest.host.name and rest.port Connect worker configs

2021-06-03 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356530#comment-17356530
 ] 

Randall Hauch commented on KAFKA-12482:
---

Update: it's time to remove these -- it's been over three years since they were 
deprecated. See discussion thread on [Changing Defaults in Connect 3.0.0 wiki 
page|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177047362].

We just need a PR to make this change. No KIP is required: KIP-208 included the 
deprecation and was approved and implemented in AK 1.1. Technically we can 
remove anything deprecated in any subsequent major release, and this has been 
deprecated long enough (3+ years) to give people people a chance to migrate.

> Remove deprecated rest.host.name and rest.port Connect worker configs
> -
>
> Key: KAFKA-12482
> URL: https://issues.apache.org/jira/browse/KAFKA-12482
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Priority: Critical
> Fix For: 3.0.0
>
>
> The following Connect worker configuration properties were deprecated and 
> should be removed in 3.0.0:
>  * {{rest.host.name}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{rest.port}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
> See KAFKA-12717 for removing the internal converter configurations:
>  * {{internal.key.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
>  * {{internal.value.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12482) Remove deprecated rest.host.name and rest.port Connect worker configs

2021-06-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12482:
--
Description: 
The following Connect worker configuration properties were deprecated and 
should be removed in 3.0.0:
 * {{rest.host.name}} (deprecated in 
[KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])

 * {{rest.port}} (deprecated in 
[KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])

See KAFKA-12717 for removing the internal converter configurations:
 * {{internal.key.converter}} (deprecated in 
[KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
 * {{internal.value.converter}} (deprecated in 
[KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])

  was:
The following Connect worker configuration properties were deprecated and 
should be removed in 3.0.0:
 * {{rest.host.name}} (deprecated in 
[KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])

 * {{rest.port}} (deprecated in 
[KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
 * {{internal.key.converter}} (deprecated in 
[KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
 * {{internal.value.converter}} (deprecated in 
[KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])


> Remove deprecated rest.host.name and rest.port Connect worker configs
> -
>
> Key: KAFKA-12482
> URL: https://issues.apache.org/jira/browse/KAFKA-12482
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Priority: Critical
> Fix For: 3.0.0
>
>
> The following Connect worker configuration properties were deprecated and 
> should be removed in 3.0.0:
>  * {{rest.host.name}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{rest.port}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
> See KAFKA-12717 for removing the internal converter configurations:
>  * {{internal.key.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
>  * {{internal.value.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12717) Remove internal converter config properties

2021-06-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12717:
--
Issue Type: Task  (was: Bug)

> Remove internal converter config properties
> ---
>
> Key: KAFKA-12717
> URL: https://issues.apache.org/jira/browse/KAFKA-12717
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>  Labels: needs-kip
>
> KAFKA-5540 / 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig]
>  deprecated but did not officially remove Connect's internal converter worker 
> config properties. With the upcoming 3.0 release, we can make the 
> backwards-incompatible change of completely removing these properties once 
> and for all.
>  
> One migration path for users who may still be running Connect clusters with 
> different internal converters can be:
>  # Stop all workers on the cluster
>  # For each internal topic (config, offsets, and status):
>  ## Create a new topic to take the place of the existing one
>  ## For every message in the existing topic:
>  ### Deserialize the message's key and value using the Connect cluster's old 
> internal key and value converters
>  ### Serialize the message's key and value using the [JSON 
> converter|https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java]
>  with schemas disabled (by setting the {{schemas.enable}} property to 
> {{false}})
>  ### Write a message with the new key and value to the new internal topic
>  # Reconfigure each Connect worker to use the newly-created internal topics 
> from step 2
>  # Start all workers on the cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12717) Remove internal converter config properties

2021-06-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12717:
--
Fix Version/s: 3.0.0

> Remove internal converter config properties
> ---
>
> Key: KAFKA-12717
> URL: https://issues.apache.org/jira/browse/KAFKA-12717
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> KAFKA-5540 / 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig]
>  deprecated but did not officially remove Connect's internal converter worker 
> config properties. With the upcoming 3.0 release, we can make the 
> backwards-incompatible change of completely removing these properties once 
> and for all.
>  
> One migration path for users who may still be running Connect clusters with 
> different internal converters can be:
>  # Stop all workers on the cluster
>  # For each internal topic (config, offsets, and status):
>  ## Create a new topic to take the place of the existing one
>  ## For every message in the existing topic:
>  ### Deserialize the message's key and value using the Connect cluster's old 
> internal key and value converters
>  ### Serialize the message's key and value using the [JSON 
> converter|https://github.com/apache/kafka/blob/trunk/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java]
>  with schemas disabled (by setting the {{schemas.enable}} property to 
> {{false}})
>  ### Write a message with the new key and value to the new internal topic
>  # Reconfigure each Connect worker to use the newly-created internal topics 
> from step 2
>  # Start all workers on the cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12482) Remove deprecated rest.host.name and rest.port Connect worker configs

2021-06-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12482:
--
Issue Type: Task  (was: Improvement)

> Remove deprecated rest.host.name and rest.port Connect worker configs
> -
>
> Key: KAFKA-12482
> URL: https://issues.apache.org/jira/browse/KAFKA-12482
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Priority: Critical
> Fix For: 3.0.0
>
>
> The following Connect worker configuration properties were deprecated and 
> should be removed in 3.0.0:
>  * {{rest.host.name}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{rest.port}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{internal.key.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
>  * {{internal.value.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12482) Remove deprecated rest.host.name and rest.port Connect worker configs

2021-06-03 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12482:
--
Summary: Remove deprecated rest.host.name and rest.port Connect worker 
configs  (was: Remove deprecated Connect worker configs)

> Remove deprecated rest.host.name and rest.port Connect worker configs
> -
>
> Key: KAFKA-12482
> URL: https://issues.apache.org/jira/browse/KAFKA-12482
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Randall Hauch
>Priority: Critical
> Fix For: 3.0.0
>
>
> The following Connect worker configuration properties were deprecated and 
> should be removed in 3.0.0:
>  * {{rest.host.name}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{rest.port}} (deprecated in 
> [KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface])
>  * {{internal.key.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])
>  * {{internal.value.converter}} (deprecated in 
> [KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4793) Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed tasks

2021-06-02 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355973#comment-17355973
 ] 

Randall Hauch commented on KAFKA-4793:
--

[~kpatelatwork], thanks for offering. Yes, your help would be very much 
welcomed. I can work with you offline about the work I've already started but 
haven't yet completed. Feel free to assign this issue to yourself.

> Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed 
> tasks
> -
>
> Key: KAFKA-4793
> URL: https://issues.apache.org/jira/browse/KAFKA-4793
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Randall Hauch
>Priority: Major
>  Labels: needs-kip
>
> Sometimes tasks stop due to repeated failures. Users will want to restart the 
> connector and have it retry after fixing an issue. 
> We expected "POST /connectors/(string: name)/restart" to cause retry of 
> failed tasks, but this doesn't appear to be the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4793) Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed tasks

2021-06-02 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355860#comment-17355860
 ] 

Randall Hauch commented on KAFKA-4793:
--

I've created a KIP that expands the existing REST API method to restart a 
connector (the `Connector` instance, not the "named connector") with two new 
optional query parameters. Users can make one call using this method (with the 
query parameters) to request a combination of failed/all `Connector` and/or 
`Task` instances.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks

> Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed 
> tasks
> -
>
> Key: KAFKA-4793
> URL: https://issues.apache.org/jira/browse/KAFKA-4793
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Randall Hauch
>Priority: Major
>  Labels: needs-kip
>
> Sometimes tasks stop due to repeated failures. Users will want to restart the 
> connector and have it retry after fixing an issue. 
> We expected "POST /connectors/(string: name)/restart" to cause retry of 
> failed tasks, but this doesn't appear to be the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4793) Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed tasks

2021-06-01 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355287#comment-17355287
 ] 

Randall Hauch commented on KAFKA-4793:
--

I'm picking this up, and will add a KIP that proposes adding query parameters 
to the existing `/connectors/{name}/restart` API. One query parameter will 
specify whether to also include tasks when determining the instances to 
restart, and another query parameter that will specify whether to restart only 
failed instances.

I will post the link to the KIP here as soon as I've created it.

> Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed 
> tasks
> -
>
> Key: KAFKA-4793
> URL: https://issues.apache.org/jira/browse/KAFKA-4793
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Randall Hauch
>Priority: Major
>  Labels: needs-kip
>
> Sometimes tasks stop due to repeated failures. Users will want to restart the 
> connector and have it retry after fixing an issue. 
> We expected "POST /connectors/(string: name)/restart" to cause retry of 
> failed tasks, but this doesn't appear to be the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7749) confluent does not provide option to set consumer properties at connector level

2021-06-01 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-7749.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

[KIP-458|https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]
 introduced in AK 2.3.0 added support for connector-specific client overrides 
like the one described here.

Marking as resolved.

> confluent does not provide option to set consumer properties at connector 
> level
> ---
>
> Key: KAFKA-7749
> URL: https://issues.apache.org/jira/browse/KAFKA-7749
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Manjeet Duhan
>Priority: Major
> Fix For: 2.3.0
>
>
> _We want to increase consumer.max.poll.record to increase performance but 
> this  value can only be set in worker properties which is applicable to all 
> connectors given cluster._
>  __ 
> _Operative Situation :- We have one project which is communicating with 
> Elasticsearch and we set consumer.max.poll.record=500 after multiple 
> performance tests which worked fine for an year._
>  _Then one more project onboarded in the same cluster which required 
> consumer.max.poll.record=5000 based on their performance tests. This 
> configuration is moved to production._
>   _Admetric started failing as it was taking more than 5 minutes to process 
> 5000 polled records and started throwing commitfailed exception which is 
> vicious cycle as it will process same data over and over again._
>  __ 
> _We can control above if start consumer using plain java but this control was 
> not available at each consumer level in confluent connector._
> _I have overridden kafka code to accept connector properties which will be 
> applied to single connector and others will keep on using default properties 
> . These changes are already running in production for more than 5 months._
> _Some of the properties which were useful for us._
> max.poll.records
> max.poll.interval.ms
> request.timeout.ms
> key.deserializer
> value.deserializer
> heartbeat.interval.ms
> session.timeout.ms
> auto.offset.reset
> connections.max.idle.ms
> enable.auto.commit
>  
> auto.commit.interval.ms
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-4793) Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed tasks

2021-06-01 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-4793:


Assignee: Randall Hauch

> Kafka Connect: POST /connectors/(string: name)/restart doesn't start failed 
> tasks
> -
>
> Key: KAFKA-4793
> URL: https://issues.apache.org/jira/browse/KAFKA-4793
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Randall Hauch
>Priority: Major
>  Labels: needs-kip
>
> Sometimes tasks stop due to repeated failures. Users will want to restart the 
> connector and have it retry after fixing an issue. 
> We expected "POST /connectors/(string: name)/restart" to cause retry of 
> failed tasks, but this doesn't appear to be the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5892) Connector property override does not work unless setting ALL converter properties

2021-05-27 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352737#comment-17352737
 ] 

Randall Hauch commented on KAFKA-5892:
--

[~dc-heros], I've added your account as a contributor to the project, so you 
should now be able to self-assign KAFKA issues. Please try to assign this to 
yourself, and let me know here if you cannot. Thanks for volunteering to take 
this up!

> Connector property override does not work unless setting ALL converter 
> properties
> -
>
> Key: KAFKA-5892
> URL: https://issues.apache.org/jira/browse/KAFKA-5892
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Yeva Byzek
>Assignee: Jitendra Sahu
>Priority: Minor
>  Labels: newbie
>
> A single connector setting override {{value.converter.schemas.enable=false}} 
> only takes effect if ALL of the converter properties are overridden in the 
> connector.
> At minimum, we should give user warning or error that this is will be ignored.
> We should also consider changing the behavior to allow the single property 
> override even if all the converter properties are not specified, but this 
> requires discussion to evaluate the impact of this change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12252) Distributed herder tick thread loops rapidly when worker loses leadership

2021-05-06 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-12252.
---
Fix Version/s: 2.8.1
   3.0.0
   Resolution: Fixed

I'm still working on backporting this to the 2.7 and 2.6 branches. When I'm 
able to do that, I'll update the fix versions on this issue.

> Distributed herder tick thread loops rapidly when worker loses leadership
> -
>
> Key: KAFKA-12252
> URL: https://issues.apache.org/jira/browse/KAFKA-12252
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.8.1
>
>
> When a new session key is read from the config topic, if the worker is the 
> leader, it [schedules a new key 
> rotation|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1579-L1581].
>  The time between key rotations is configurable but defaults to an hour.
> The herder then continues its tick loop, which usually ends with a long poll 
> for rebalance activity. However, when a key rotation is scheduled, it will 
> [limit the time spent 
> polling|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L384-L388]
>  at the end of the tick loop in order to be able to perform the rotation.
> Once woken up, the worker checks to see if a key rotation is necessary and, 
> if so, [sets the expected key rotation time to 
> Long.MAX_VALUE|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L344],
>  then [writes a new session key to the config 
> topic|https://github.com/apache/kafka/blob/bf4afae8f53471ab6403cbbfcd2c4e427bdd4568/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L345-L348].
>  The problem is, [the worker only ever decides a key rotation is necessary if 
> it is still the 
> leader|https://github.com/apache/kafka/blob/5cf9cfcaba67cffa2435b07ade58365449c60bd9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L456-L469].
>  If the worker is no longer the leader at the time of the key rotation 
> (likely due to falling out of the cluster after losing contact with the group 
> coordinator), its key expiration time won’t be reset, and the long poll for 
> rebalance activity at the end of the tick loop will be given a timeout of 0 
> ms and result in the tick loop being immediately restarted. Even if the 
> worker reads a new session key from the config topic, it’ll continue looping 
> like this since its scheduled key rotation won’t be updated. At this point, 
> the only thing that would help the worker get back into a healthy state would 
> be if it were made the leader of the cluster again.
> One possible fix could be to add a conditional check in the tick thread to 
> only limit the time spent on rebalance polling if the worker is currently the 
> leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10340) Source connectors should report error when trying to produce records to non-existent topics instead of hanging forever

2021-05-05 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339862#comment-17339862
 ] 

Randall Hauch commented on KAFKA-10340:
---

I cherry-picked the original PR (https://github.com/apache/kafka/pull/10016) to 
the `2.8` branch (now that it's not frozen) and updated the fixed versions.

This completes all of the planned work for this issue.

> Source connectors should report error when trying to produce records to 
> non-existent topics instead of hanging forever
> --
>
> Key: KAFKA-10340
> URL: https://issues.apache.org/jira/browse/KAFKA-10340
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.1, 2.7.0, 2.6.1, 2.8.0
>Reporter: Arjun Satish
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.7.1, 2.6.2, 2.8.1
>
>
> Currently, a source connector will blindly attempt to write a record to a 
> Kafka topic. When the topic does not exist, its creation is controlled by the 
> {{auto.create.topics.enable}} config on the brokers. When auto.create is 
> disabled, the producer.send() call on the Connect worker will hang 
> indefinitely (due to the "infinite retries" configuration for said producer). 
> In setups where this config is usually disabled, the source connector simply 
> appears to hang and not produce any output.
> It is desirable to either log an info or an error message (or inform the user 
> somehow) that the connector is simply stuck waiting for the destination topic 
> to be created. When the worker has permissions to inspect the broker 
> settings, it can use the {{listTopics}} and {{describeConfigs}} API in 
> AdminClient to check if the topic exists, the broker can 
> {{auto.create.topics.enable}} topics, and if these cases do not exist, either 
> throw an error.
> With the recently merged 
> [KIP-158|https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics],
>  this becomes even more specific a corner case: when topic creation settings 
> are enabled, the worker should handle the corner case where topic creation is 
> disabled, {{auto.create.topics.enable}} is set to false and topic does not 
> exist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10340) Source connectors should report error when trying to produce records to non-existent topics instead of hanging forever

2021-05-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10340:
--
Fix Version/s: 2.8.1

> Source connectors should report error when trying to produce records to 
> non-existent topics instead of hanging forever
> --
>
> Key: KAFKA-10340
> URL: https://issues.apache.org/jira/browse/KAFKA-10340
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.5.1, 2.7.0, 2.6.1, 2.8.0
>Reporter: Arjun Satish
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.7.1, 2.6.2, 2.8.1
>
>
> Currently, a source connector will blindly attempt to write a record to a 
> Kafka topic. When the topic does not exist, its creation is controlled by the 
> {{auto.create.topics.enable}} config on the brokers. When auto.create is 
> disabled, the producer.send() call on the Connect worker will hang 
> indefinitely (due to the "infinite retries" configuration for said producer). 
> In setups where this config is usually disabled, the source connector simply 
> appears to hang and not produce any output.
> It is desirable to either log an info or an error message (or inform the user 
> somehow) that the connector is simply stuck waiting for the destination topic 
> to be created. When the worker has permissions to inspect the broker 
> settings, it can use the {{listTopics}} and {{describeConfigs}} API in 
> AdminClient to check if the topic exists, the broker can 
> {{auto.create.topics.enable}} topics, and if these cases do not exist, either 
> throw an error.
> With the recently merged 
> [KIP-158|https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics],
>  this becomes even more specific a corner case: when topic creation settings 
> are enabled, the worker should handle the corner case where topic creation is 
> disabled, {{auto.create.topics.enable}} is set to false and topic does not 
> exist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-10231) Broken Kafka Connect node to node communication if invalid hostname is in place

2021-04-19 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10231:
--
Description: 
As a Kafka Connect operator I would expect a more definitive error when the 
internal node to node communication can't happen.

If the hostname contains an invalid character according to the [RFC1123 section 
2.1|#page-13], the error raised by the Kafka Connect worker node look like:

 
{quote}{{[2020-06-30 10:38:49,990] ERROR Uncaught exception in REST call to 
/connectors 
(org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper)}}{{java.lang.IllegalArgumentException:
 Invalid URI host: null (authority: kafka_connect-0.dev-2:8083)}}{{at 
org.eclipse.jetty.client.HttpClient.checkHost(HttpClient.java:506)}}{{
at org.eclipse.jetty.client.HttpClient.newHttpRequest(HttpClient.java:491)}}{{  
  at 
org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:449)}}{{
at org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:438)}}{{  
  at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:83)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:309)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:138)}}
{quote}
 

it would be much nicer for operators that such situations are detected in 
similar, or improved, version as the JVM is doing in the [IDN class|#L291]].

 

  was:
As a Kafka Connect operator I would expect a more definitive error when the 
internal node to node communication can't happen.

If the hostname contains an invalid character according to the [RFC1123 section 
2.1|#page-13]], the error raised by the Kafka Connect worker node look like:

 
{quote}{{[2020-06-30 10:38:49,990] ERROR Uncaught exception in REST call to 
/connectors 
(org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper)}}{{java.lang.IllegalArgumentException:
 Invalid URI host: null (authority: kafka_connect-0.dev-2:8083)}}{{at 
org.eclipse.jetty.client.HttpClient.checkHost(HttpClient.java:506)}}{{
at org.eclipse.jetty.client.HttpClient.newHttpRequest(HttpClient.java:491)}}{{  
  at 
org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:449)}}{{
at org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:438)}}{{  
  at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:83)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:309)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:138)}}
{quote}
 

it would be much nicer for operators that such situations are detected in 
similar, or improved, version as the JVM is doing in the [IDN class|#L291]].

 


> Broken Kafka Connect node to node communication if invalid hostname is in 
> place
> ---
>
> Key: KAFKA-10231
> URL: https://issues.apache.org/jira/browse/KAFKA-10231
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.3.1, 2.5.0, 2.4.1
>Reporter: Pere Urbon-Bayes
>Assignee: Kalpesh Patel
>Priority: Minor
>
> As a Kafka Connect operator I would expect a more definitive error when the 
> internal node to node communication can't happen.
> If the hostname contains an invalid character according to the [RFC1123 
> section 2.1|#page-13], the error raised by the Kafka Connect worker node look 
> like:
>  
> {quote}{{[2020-06-30 10:38:49,990] ERROR Uncaught exception in REST call to 
> /connectors 
> (org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper)}}{{java.lang.IllegalArgumentException:
>  Invalid URI host: null (authority: kafka_connect-0.dev-2:8083)}}{{at 
> org.eclipse.jetty.client.HttpClient.checkHost(HttpClient.java:506)}}{{
> at 
> org.eclipse.jetty.client.HttpClient.newHttpRequest(HttpClient.java:491)}}{{   
>  at 
> org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:449)}}{{   
>  at org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:438)}}{{   
>  at 
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:83)}}{{
> at 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:309)}}{{
> at 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:138)}}
> {quote}
>  
> it would be much nicer for operators that such situations are detected in 
> similar, or 

[jira] [Updated] (KAFKA-10231) Broken Kafka Connect node to node communication if invalid hostname is in place

2021-04-19 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-10231:
--
Description: 
As a Kafka Connect operator I would expect a more definitive error when the 
internal node to node communication can't happen.

If the hostname contains an invalid character according to the [RFC1123 section 
2.1|https://tools.ietf.org/html/rfc1123#page-13], the error raised by the Kafka 
Connect worker node look like:

 
{quote}{{[2020-06-30 10:38:49,990] ERROR Uncaught exception in REST call to 
/connectors 
(org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper)}}{{java.lang.IllegalArgumentException:
 Invalid URI host: null (authority: kafka_connect-0.dev-2:8083)}}{{at 
org.eclipse.jetty.client.HttpClient.checkHost(HttpClient.java:506)}}{{
at org.eclipse.jetty.client.HttpClient.newHttpRequest(HttpClient.java:491)}}{{  
  at 
org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:449)}}{{
at org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:438)}}{{  
  at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:83)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:309)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:138)}}
{quote}
 

it would be much nicer for operators that such situations are detected in 
similar, or improved, version as the JVM is doing in the [IDN class|#L291]].

 

  was:
As a Kafka Connect operator I would expect a more definitive error when the 
internal node to node communication can't happen.

If the hostname contains an invalid character according to the [RFC1123 section 
2.1|#page-13], the error raised by the Kafka Connect worker node look like:

 
{quote}{{[2020-06-30 10:38:49,990] ERROR Uncaught exception in REST call to 
/connectors 
(org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper)}}{{java.lang.IllegalArgumentException:
 Invalid URI host: null (authority: kafka_connect-0.dev-2:8083)}}{{at 
org.eclipse.jetty.client.HttpClient.checkHost(HttpClient.java:506)}}{{
at org.eclipse.jetty.client.HttpClient.newHttpRequest(HttpClient.java:491)}}{{  
  at 
org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:449)}}{{
at org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:438)}}{{  
  at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:83)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:309)}}{{
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:138)}}
{quote}
 

it would be much nicer for operators that such situations are detected in 
similar, or improved, version as the JVM is doing in the [IDN class|#L291]].

 


> Broken Kafka Connect node to node communication if invalid hostname is in 
> place
> ---
>
> Key: KAFKA-10231
> URL: https://issues.apache.org/jira/browse/KAFKA-10231
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0, 2.3.1, 2.5.0, 2.4.1
>Reporter: Pere Urbon-Bayes
>Assignee: Kalpesh Patel
>Priority: Minor
>
> As a Kafka Connect operator I would expect a more definitive error when the 
> internal node to node communication can't happen.
> If the hostname contains an invalid character according to the [RFC1123 
> section 2.1|https://tools.ietf.org/html/rfc1123#page-13], the error raised by 
> the Kafka Connect worker node look like:
>  
> {quote}{{[2020-06-30 10:38:49,990] ERROR Uncaught exception in REST call to 
> /connectors 
> (org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper)}}{{java.lang.IllegalArgumentException:
>  Invalid URI host: null (authority: kafka_connect-0.dev-2:8083)}}{{at 
> org.eclipse.jetty.client.HttpClient.checkHost(HttpClient.java:506)}}{{
> at 
> org.eclipse.jetty.client.HttpClient.newHttpRequest(HttpClient.java:491)}}{{   
>  at 
> org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:449)}}{{   
>  at org.eclipse.jetty.client.HttpClient.newRequest(HttpClient.java:438)}}{{   
>  at 
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:83)}}{{
> at 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:309)}}{{
> at 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:138)}}
> {quote}
>  
> it would be much nicer for 

[jira] [Updated] (KAFKA-6985) Error connection between cluster node

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-6985:
-
Component/s: (was: KafkaConnect)
 core

> Error connection between cluster node
> -
>
> Key: KAFKA-6985
> URL: https://issues.apache.org/jira/browse/KAFKA-6985
> Project: Kafka
>  Issue Type: Bug
>  Components: core
> Environment: Centos-7
>Reporter: Ranjeet Ranjan
>Priority: Major
>
> Hi Have setup multi-node Kafka cluster but getting an error while connecting 
> one node to another although there is an issue with firewall or port. I am 
> able to telnet 
> WARN [ReplicaFetcherThread-0-1], Error in fetch 
> Kafka.server.ReplicaFetcherThread$FetchRequest@8395951 
> (Kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to Kafka-1:9092 (id: 1 rack: null) failed
>  
> {code:java}
>  
> at 
> kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:84)
> at 
> kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:94)
> at 
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> {code}
> Here you go server.properties
> Node:1
>  
> {code:java}
> # Server Basics #
> # The id of the broker. This must be set to a unique integer for each broker.
> broker.id=1
> # Switch to enable topic deletion or not, default value is false
> delete.topic.enable=true
> # Socket Server Settings 
> #
> listeners=PLAINTEXT://kafka-1:9092
> advertised.listeners=PLAINTEXT://kafka-1:9092
> #listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> # The number of threads handling network requests
> num.network.threads=3
> # The number of threads doing disk I/O
> num.io.threads=8
> # The send buffer (SO_SNDBUF) used by the socket server
> socket.send.buffer.bytes=102400
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=102400
> # The maximum size of a request that the socket server will accept 
> (protection against OOM)
> socket.request.max.bytes=104857600
> # Log Basics #
> # A comma seperated list of directories under which to store log files
> log.dirs=/var/log/kafka
> # The default number of log partitions per topic. More partitions allow 
> greater
> # parallelism for consumption, but this will also result in more files across
> # the brokers.
> num.partitions=1
> # The number of threads per data directory to be used for log recovery at 
> startup and flushing at shutdown.
> # This value is recommended to be increased for installations with data dirs 
> located in RAID array.
> num.recovery.threads.per.data.dir=1
> # Log Retention Policy 
> #
> # The minimum age of a log file to be eligible for deletion due to age
> log.retention.hours=48
> # A size-based retention policy for logs. Segments are pruned from the log as 
> long as the remaining
> # segments don't drop below log.retention.bytes. Functions independently of 
> log.retention.hours.
> log.retention.bytes=1073741824
> # The maximum size of a log segment file. When this size is reached a new log 
> segment will be created.
> log.segment.bytes=1073741824
> # The interval at which log segments are checked to see if they can be 
> deleted according
> # to the retention policies
> log.retention.check.interval.ms=30
> # Zookeeper #
> # root directory for all kafka znodes.
> zookeeper.connect=10.130.82.28:2181
> # Timeout in ms for connecting to zookeeper
> zookeeper.connection.timeout.ms=6000
> {code}
>  
>  
> Node-2
> {code:java}
> # Server Basics #
> # The id of the broker. This must be set to a unique integer for each broker.
> broker.id=2
> # Switch to enable topic deletion or not, default value is false
> delete.topic.enable=true
> # Socket Server Settings 
> #
> listeners=PLAINTEXT://kafka-2:9092
> advertised.listeners=PLAINTEXT://kafka-2:9092
> #listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
> # 

[jira] [Resolved] (KAFKA-8551) Comments for connectors() in Herder interface

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-8551.
--
Resolution: Won't Fix

Marking as won't fix, since the details are insufficient to try to address.

> Comments for connectors() in Herder interface 
> --
>
> Key: KAFKA-8551
> URL: https://issues.apache.org/jira/browse/KAFKA-8551
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.1
>Reporter: Luying Liu
>Priority: Major
>
> There are mistakes in the comments for connectors() in Herder interface.  The 
> mistakes are in the  file 
> [kafka|https://github.com/apache/kafka]/[connect|https://github.com/apache/kafka/tree/trunk/connect]/[runtime|https://github.com/apache/kafka/tree/trunk/connect/runtime]/[src|https://github.com/apache/kafka/tree/trunk/connect/runtime/src]/[main|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main]/[java|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main/java]/[org|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main/java/org]/[apache|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main/java/org/apache]/[kafka|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main/java/org/apache/kafka]/[connect|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main/java/org/apache/kafka/connect]/[runtime|https://github.com/apache/kafka/tree/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime]/*Herder.java.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8664) non-JSON format messages when streaming data from Kafka to Mongo

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-8664.
--
Resolution: Won't Fix

The reported problem is for a connector implementation that is not owned by the 
Apache Kafka project. Please report the issue with the provider of the 
connector.

> non-JSON format messages when streaming data from Kafka to Mongo
> 
>
> Key: KAFKA-8664
> URL: https://issues.apache.org/jira/browse/KAFKA-8664
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.0
>Reporter: Vu Le
>Priority: Major
> Attachments: MongoSinkConnector.properties, 
> log_error_when_stream_data_not_a_json_format.txt
>
>
> Hi team,
> I can stream data from Kafka to MongoDB with JSON messages. I use MongoDB 
> Kafka Connector 
> ([https://github.com/mongodb/mongo-kafka/blob/master/docs/install.md])
> However, if I send a non-JSON format message the Connector died. Please see 
> the log file for details.
> My config file:
> {code:java}
> name=mongo-sink
> topics=testconnector.class=com.mongodb.kafka.connect.MongoSinkConnector
> tasks.max=1
> key.ignore=true
> # Specific global MongoDB Sink Connector configuration
> connection.uri=mongodb://localhost:27017
> database=test_kafka
> collection=transaction
> max.num.retries=3
> retries.defer.timeout=5000
> type.name=kafka-connect
> key.converter=org.apache.kafka.connect.json.JsonConverter
> key.converter.schemas.enable=false
> value.converter=org.apache.kafka.connect.json.JsonConverter
> value.converter.schemas.enable=false
> {code}
> I have 2 separated questions:  
>  # how to ignore the message which is non-json format?
>  # how to defined a default-key for this kind of message (for example: abc -> 
> \{ "non-json": "abc" } )
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8867) Kafka Connect JDBC fails to create PostgreSQL table with default boolean value in schema

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-8867.
--
Resolution: Won't Fix

The reported problem is for the Confluent JDBC source/sink connector, and 
should be reported via that connector's GitHub repository issues.

> Kafka Connect JDBC fails to create PostgreSQL table with default boolean 
> value in schema
> 
>
> Key: KAFKA-8867
> URL: https://issues.apache.org/jira/browse/KAFKA-8867
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0
>Reporter: Tudor
>Priority: Major
>
> The `CREATE TABLE ..` statement generated for JDBC sink connectors when 
> configured with `auto.create: true` generates field declarations that do not 
> conform to allowed PostgreSQL syntax when considering fields of type boolean 
> with default values.
> Example record value Avro schema:
> {code:java}
> {
>   "namespace": "com.test.avro.schema.v1",
>   "type": "record",
>   "name": "SomeEvent",
>   "fields": [
> {
>   "name": "boolean_field",
>   "type": "boolean",
>   "default": false
> }
>   ]
> }
> {code}
> The connector task fails with:  
> {code:java}
> ERROR WorkerSinkTask{id=test-events-sink-0} RetriableException from SinkTask: 
> (org.apache.kafka.connect.runtime.WorkerSinkTask:551)
> org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: 
> org.postgresql.util.PSQLException: ERROR: column "boolean_field" is of type 
> boolean but default expression is of type integer
>   Hint: You will need to rewrite or cast the expression.
>   at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:93)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){code}
>  
> The generated SQL statement is: 
> {code:java}
> CREATE TABLE "test_data" ("boolean_field" BOOLEAN DEFAULT 0){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8961) Unable to create secure JDBC connection through Kafka Connect

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-8961.
--
Resolution: Won't Fix

This is not a problem of the Connect framework, and is instead an issue with 
the connector implementation – or more likely the _installation_ of the 
connector in the user's environment.

> Unable to create secure JDBC connection through Kafka Connect
> -
>
> Key: KAFKA-8961
> URL: https://issues.apache.org/jira/browse/KAFKA-8961
> Project: Kafka
>  Issue Type: Bug
>  Components: build, clients, KafkaConnect, network
>Affects Versions: 2.2.1
>Reporter: Monika Bainsala
>Priority: Major
>
> As per below article for enabling JDBC secure connection, we can use updated 
> URL parameter while calling the create connector REST API.
> Exampl:
> jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=YES)(FAILOVER=YES)(ADDRESS=(PROTOCOL=tcp)(HOST=X)(PORT=1520)))(CONNECT_DATA=(SERVICE_NAME=XXAP)));EncryptionLevel=requested;EncryptionTypes=RC4_256;DataIntegrityLevel=requested;DataIntegrityTypes=MD5"
>  
> But this approach is not working currently, kindly help in resolving this 
> issue.
>  
> Reference :
> [https://docs.confluent.io/current/connect/kafka-connect-jdbc/source-connector/source_config_options.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9017) We see timeout in kafka in production cluster

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9017:
-
Component/s: (was: KafkaConnect)
 core

> We see timeout in kafka in production cluster
> -
>
> Key: KAFKA-9017
> URL: https://issues.apache.org/jira/browse/KAFKA-9017
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
> Environment: Production
>Reporter: Suhas
>Priority: Critical
> Attachments: stderr (7), stdout (12)
>
>
> We see timeout in kafka in production cluster and Kafka is running on 
> DC/OS(MESOS)
> and below are the errors 
> *+Exception 1: This from application logs+*
> 2019-10-07 10:01:59 Error: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
> ie-lrx-audit-evt-3: 30030 ms has passed since batch creation plus linger time
> *+Exception 2:This from application logs+*
>  {"eventTime":"2019-10-07 08:20:43.265", "logType":"ERROR", "stackMessage" : 
> "java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
> ie-lrx-audit-evt-3: 30028 ms has passed since batch creation plus linger 
> time", "stackTrace" : 
> *+Exception (from log) We see this logs on broker logs+*
> [2019-10-10 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, 
> fetcherId=0] Error sending fetch request (sessionId=919177392, epoch=INITIAL) 
> to node 2: java.io.IOException: Connection to 2 was disconnected before the 
> response was read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 
> 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] 
> Error sending fetch request (sessionId=919177392, epoch=INITIAL) to node 2: 
> java.io.IOException: Connection to 2 was disconnected before the response was 
> read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 06:32:10,849] 
> WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error in response 
> for fetch request (type=FetchRequest, replicaId=4, maxWait=500, minBytes=1, 
> maxBytes=10485760, fetchData=\{ie-lrx-rxer-audit-evt-0=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), 
> mft-hdfs-landing-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[108]), dca-audit-evt-2=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), 
> it-sou-audit-evt-7=(offset=94819, logStartOffset=94819, maxBytes=1048576, 
> currentLeaderEpoch=Optional[100]), intg-ie-lrx-rxer-audit-evt-2=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[78]), 
> prod-pipelines-errors-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[117]), __consumer_offsets-36=(offset=3, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), 
> panel-data-change-evt-4=(offset=0, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[108]), gdcp-notification-evt-2=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), 
> data-transfer-change-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[108]), __consumer_offsets-11=(offset=15, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), 
> dca-heartbeat-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[105]), ukwhs-error-topic-1=(offset=8, 
> logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[105]), 
> intg-ie-lrx-audit-evt-4=(offset=21, logStartOffset=21, maxBytes=1048576, 
> currentLeaderEpoch=Optional[74]), __consumer_offsets-16=(offset=11329814, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), 
> __consumer_offsets-31=(offset=3472033, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[107]), ukpai-hdfs-evt-1=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[107]), 
> mft-pflow-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, 
> currentLeaderEpoch=Optional[108]), ukwhs-hdfs-landing-evt-01-2=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[105]), 
> it-sou-audit-evt-2=(offset=490084, logStartOffset=490084, maxBytes=1048576, 
> currentLeaderEpoch=Optional[105]), ie-lrx-pat-audit-evt-4=(offset=0, 
> logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104])}, 
> isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=919177392, 
> epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)java.io.IOException: 
> Connection to 2 was disconnected before the response was read at 
> 

[jira] [Commented] (KAFKA-10715) Support Kafka connect converter for AVRO

2021-04-05 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315110#comment-17315110
 ] 

Randall Hauch commented on KAFKA-10715:
---

There are multiple Converter implementations outside of Kafka, and IMO there is 
no need for Kafka to own and maintain its own version of these when those other 
existing implementations can easily be used by simply installing them.

This is similar to how the Kafka project provides only example Connector 
implementations. See the [rejected alternatives of 
KIP-26|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767#KIP26AddKafkaConnectframeworkfordataimport/export-Maintainconnectorsintheprojectalongwithframework],
 which introduced the Connect framework (with Converters).

Therefore, I'm going to close this.

> Support Kafka connect converter for AVRO
> 
>
> Key: KAFKA-10715
> URL: https://issues.apache.org/jira/browse/KAFKA-10715
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ravindranath Kakarla
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I want to add support for Avro data format converter to Kafka Connect. Right 
> now, Kafka connect supports [JSON 
> converter|[https://github.com/apache/kafka/tree/trunk/connect].] Since, Avro 
> is a commonly used data format with Kafka, it will be great to have support 
> for it. 
>  
> Confluent Schema Registry libraries have 
> [support|https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroConverter.java]
>  for it. The code seems to be pretty generic and can be used directly with 
> Kafka connect without schema registry. They are also licensed under Apache 
> 2.0.
>  
> Can they be copied to this repository and made available for all users of 
> Kafka Connect?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10715) Support Kafka connect converter for AVRO

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-10715.
---
Resolution: Won't Do

> Support Kafka connect converter for AVRO
> 
>
> Key: KAFKA-10715
> URL: https://issues.apache.org/jira/browse/KAFKA-10715
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ravindranath Kakarla
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I want to add support for Avro data format converter to Kafka Connect. Right 
> now, Kafka connect supports [JSON 
> converter|[https://github.com/apache/kafka/tree/trunk/connect].] Since, Avro 
> is a commonly used data format with Kafka, it will be great to have support 
> for it. 
>  
> Confluent Schema Registry libraries have 
> [support|https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroConverter.java]
>  for it. The code seems to be pretty generic and can be used directly with 
> Kafka connect without schema registry. They are also licensed under Apache 
> 2.0.
>  
> Can they be copied to this repository and made available for all users of 
> Kafka Connect?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9988) Connect incorrectly logs that task has failed when one takes too long to shutdown

2021-04-05 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9988:
-
Labels: newbie  (was: )

> Connect incorrectly logs that task has failed when one takes too long to 
> shutdown
> -
>
> Key: KAFKA-9988
> URL: https://issues.apache.org/jira/browse/KAFKA-9988
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0, 2.4.0, 2.3.1, 2.2.3, 2.5.0, 2.3.2, 2.4.1, 2.4.2, 
> 2.5.1
>Reporter: Sanjana Kaundinya
>Priority: Major
>  Labels: newbie
>
> If the OffsetStorageReader is closed while the task is trying to shutdown, 
> and the task is trying to access the offsets from the OffsetStorageReader, 
> then we see the following in the logs.
> {code:java}
> [2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=connector-18} Task threw 
> an uncaught and unrecoverable exception 
> (org.apache.kafka.connect.runtime.WorkerTask)
> org.apache.kafka.connect.errors.ConnectException: Failed to fetch offsets.
> at 
> org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:114)
> at 
> org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offset(OffsetStorageReaderImpl.java:63)
> at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:205)
> at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
> at 
> org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.connect.errors.ConnectException: Offset reader 
> closed while attempting to read offsets. This is likely because the task was 
> been scheduled to stop but has taken longer than the graceful shutdown period 
> to do so.
> at 
> org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:103)
> ... 14 more
> [2020-05-05 05:28:58,937] ERROR WorkerSourceTask{id=connector-18} Task is 
> being killed and will not recover until manually restarted 
> (org.apache.kafka.connect.runtime.WorkerTask)
> {code}
> This is a bit misleading, because the task is already on its way of being 
> shutdown, and doesn't actually need manual intervention to be restarted. We 
> can see that as later on in the logs we see that it throws another 
> unrecoverable exception.
> {code:java}
> [2020-05-05 05:40:39,361] ERROR WorkerSourceTask{id=connector-18} Task threw 
> an uncaught and unrecoverable exception 
> (org.apache.kafka.connect.runtime.WorkerTask)
> {code}
> If we know a task is on its way of shutting down, we should not throw a 
> ConnectException and instead log a warning so that we don't log false 
> negatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12474) Worker can die if unable to write new session key

2021-04-01 Thread Randall Hauch (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313476#comment-17313476
 ] 

Randall Hauch commented on KAFKA-12474:
---

Merged to the `trunk` branch after the `2.8` branch was created and after code 
freeze, and merged to:
 * the `2.5` branch (for inclusion in upcoming 2.5.2), 
 * the `2.6` branch (for inclusion in upcoming 2.6.2), 
 * the `2.7` branch (for inclusion in upcoming 2.7.1), 
 * the `2.8` branch (for inclusion in upcoming 2.8.0) w/ permission from RM 

> Worker can die if unable to write new session key
> -
>
> Key: KAFKA-12474
> URL: https://issues.apache.org/jira/browse/KAFKA-12474
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>
>
> If a distributed worker is unable to write (and then read back) a new session 
> key to the config topic, an uncaught exception will be thrown from its 
> herder's tick thread, killing the worker.
> See 
> [https://github.com/apache/kafka/blob/8da65936d7fc53d24c665c0d01893d25a430933b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L366-L369]
> One way we can handle this case by forcing a read to the end of the config 
> topic whenever an attempt to write a new session key to the config topic 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-12474) Worker can die if unable to write new session key

2021-04-01 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12474:
--
Fix Version/s: 2.8.0

> Worker can die if unable to write new session key
> -
>
> Key: KAFKA-12474
> URL: https://issues.apache.org/jira/browse/KAFKA-12474
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.0, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.0.0, 2.5.2, 2.8.0, 2.7.1, 2.6.2
>
>
> If a distributed worker is unable to write (and then read back) a new session 
> key to the config topic, an uncaught exception will be thrown from its 
> herder's tick thread, killing the worker.
> See 
> [https://github.com/apache/kafka/blob/8da65936d7fc53d24c665c0d01893d25a430933b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L366-L369]
> One way we can handle this case by forcing a read to the end of the config 
> topic whenever an attempt to write a new session key to the config topic 
> fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >