[jira] [Updated] (KAFKA-16603) Data loss when kafka connect sending data to Kafka

2024-04-22 Thread Anil Dasari (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anil Dasari updated KAFKA-16603:

Description: 
We are experiencing a data loss when Kafka Source connector is failed to send 
data to Kafka topic and offset topic. 

Kafka cluster and Kafka connect details:
 # Kafka connect version i.e client : Confluent community version 7.3.1 i.e 
Kafka 3.3.1
 # Kafka version: 0.11.0 (server)
 # Cluster size : 3 brokers
 # Number of partitions in all topics = 3
 # Replication factor = 3
 # Min ISR set 2
 # Uses no transformations in Kafka connector
 # Use default error tolerance i.e None.

Our connector checkpoints the offsets info received in SourceTask#commitRecord 
and resume the data process from the persisted checkpoint.

The data loss is noticed when broker is unresponsive for few mins due to high 
load and kafka connector was restarted. Also, Kafka connector graceful shutdown 
failed.

Logs:

 
{code:java}
[Worker clientId=connect-1, groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] 
Discovered group coordinator 10.75.100.176:31000 (id: 2147483647 rack: null)
Apr 22, 2024 @ 15:56:16.152 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Group coordinator 
10.75.100.176:31000 (id: 2147483647 rack: null) is unavailable or invalid due 
to cause: coordinator unavailable. isDisconnected: false. Rediscovery will be 
attempted.
Apr 22, 2024 @ 15:56:16.153 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Requesting disconnect from 
last known coordinator 10.75.100.176:31000 (id: 2147483647 rack: null)
Apr 22, 2024 @ 15:56:16.514 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Node 0 disconnected.
Apr 22, 2024 @ 15:56:16.708 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Node 0 
disconnected.
Apr 22, 2024 @ 15:56:16.710 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Node 2147483647 disconnected.
Apr 22, 2024 @ 15:56:16.731 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Group coordinator 
10.75.100.176:31000 (id: 2147483647 rack: null) is unavailable or invalid due 
to cause: coordinator unavailable. isDisconnected: true. Rediscovery will be 
attempted.
Apr 22, 2024 @ 15:56:19.103 == Trying to sleep while stop == (** custom log **)
Apr 22, 2024 @ 15:56:19.755 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Broker coordinator was 
unreachable for 3000ms. Revoking previous assignment Assignment{error=0, 
leader='connect-1-8f41a1d2-6cc9-4956-9be3-1fbae9c6d305', 
leaderUrl='http://10.75.100.46:8083/', offset=4, 
connectorIds=[d094a5d7bbb046b99d62398cb84d648c], 
taskIds=[d094a5d7bbb046b99d62398cb84d648c-0], revokedConnectorIds=[], 
revokedTaskIds=[], delay=0} to avoid running tasks while not being a member the 
group
Apr 22, 2024 @ 15:56:19.866 Stopping connector d094a5d7bbb046b99d62398cb84d648c
Apr 22, 2024 @ 15:56:19.874 Stopping task d094a5d7bbb046b99d62398cb84d648c-0
Apr 22, 2024 @ 15:56:19.880 Scheduled shutdown for 
WorkerConnectorWorkerConnector{id=d094a5d7bbb046b99d62398cb84d648c}
Apr 22, 2024 @ 15:56:24.105 Connector 'd094a5d7bbb046b99d62398cb84d648c' failed 
to properly shut down, has become unresponsive, and may be consuming external 
resources. Correct the configuration for this connector or remove the 
connector. After fixing the connector, it may be necessary to restart this 
worker to release any consumed resources.
Apr 22, 2024 @ 15:56:24.110 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Closing the 
Kafka producer with timeoutMillis = 0 ms.
Apr 22, 2024 @ 15:56:24.110 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Proceeding to 
force close the producer since pending requests could not be completed within 
timeout 0 ms.
Apr 22, 2024 @ 15:56:24.112 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Beginning 
shutdown of Kafka producer I/O thread, sending remaining records.
Apr 22, 2024 @ 15:56:24.112 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Aborting 
incomplete batches due to forced shutdown
Apr 22, 2024 @ 15:56:24.113 
WorkerSourceTaskWorkerSourceTask{id=d094a5d7bbb046b99d62398cb84d648c-0} 
Committing offsets
Apr 22, 2024 @ 15:56:24.113 
WorkerSourceTaskWorkerSourceTask{id=d094a5d7bbb046b99d62398cb84d648c-0} Either 
no records were produced by the task since the last offset commit, or every 
record has been filtered out by a transformation or dropped due to 
transformation or conversion errors.
Apr 22, 2024 @ 15:56:24.146 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Finished stopping tasks in 
preparation for rebalance
Apr 22, 2024 @ 15:56:24.165 
WorkerSourceTaskWorkerSourceTask{id=d094a5d7bbb046b99d62398cb84d648c-0} 
Finished 

[jira] [Updated] (KAFKA-16603) Data loss when kafka connect sending data to Kafka

2024-04-22 Thread Anil Dasari (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anil Dasari updated KAFKA-16603:

Description: 
We are experiencing a data loss when Kafka Source connector is failed to send 
data to Kafka topic and offset topic. 

Kafka cluster and Kafka connect details:
 # Kafka connect version i.e client : Confluent community version 7.3.1 i.e 
Kafka 3.3.1
 # Kafka version: 0.11.0 (server)
 # Cluster size : 3 brokers
 # Number of partitions in all topics = 3
 # Replication factor = 3
 # Min ISR set 2
 # Uses no transformations in Kafka connector
 # Use default error tolerance i.e None.

Our connector checkpoints the offsets info received in SourceTask#commitRecord 
and resume the data process from the persisted checkpoint.

The data loss is noticed when broker is unresponsive for few mins due to high 
load and kafka connector was restarted. However, Kafka connector graceful 
shutdown failed.

Logs:

 
{code:java}
[Worker clientId=connect-1, groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] 
Discovered group coordinator 10.75.100.176:31000 (id: 2147483647 rack: null)
Apr 22, 2024 @ 15:56:16.152 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Group coordinator 
10.75.100.176:31000 (id: 2147483647 rack: null) is unavailable or invalid due 
to cause: coordinator unavailable. isDisconnected: false. Rediscovery will be 
attempted.
Apr 22, 2024 @ 15:56:16.153 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Requesting disconnect from 
last known coordinator 10.75.100.176:31000 (id: 2147483647 rack: null)
Apr 22, 2024 @ 15:56:16.514 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Node 0 disconnected.
Apr 22, 2024 @ 15:56:16.708 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Node 0 
disconnected.
Apr 22, 2024 @ 15:56:16.710 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Node 2147483647 disconnected.
Apr 22, 2024 @ 15:56:16.731 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Group coordinator 
10.75.100.176:31000 (id: 2147483647 rack: null) is unavailable or invalid due 
to cause: coordinator unavailable. isDisconnected: true. Rediscovery will be 
attempted.
Apr 22, 2024 @ 15:56:19.103 == Trying to sleep while stop == (** custom log **)
Apr 22, 2024 @ 15:56:19.755 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Broker coordinator was 
unreachable for 3000ms. Revoking previous assignment Assignment{error=0, 
leader='connect-1-8f41a1d2-6cc9-4956-9be3-1fbae9c6d305', 
leaderUrl='http://10.75.100.46:8083/', offset=4, 
connectorIds=[d094a5d7bbb046b99d62398cb84d648c], 
taskIds=[d094a5d7bbb046b99d62398cb84d648c-0], revokedConnectorIds=[], 
revokedTaskIds=[], delay=0} to avoid running tasks while not being a member the 
group
Apr 22, 2024 @ 15:56:19.866 Stopping connector d094a5d7bbb046b99d62398cb84d648c
Apr 22, 2024 @ 15:56:19.874 Stopping task d094a5d7bbb046b99d62398cb84d648c-0
Apr 22, 2024 @ 15:56:19.880 Scheduled shutdown for 
WorkerConnectorWorkerConnector{id=d094a5d7bbb046b99d62398cb84d648c}
Apr 22, 2024 @ 15:56:24.105 Connector 'd094a5d7bbb046b99d62398cb84d648c' failed 
to properly shut down, has become unresponsive, and may be consuming external 
resources. Correct the configuration for this connector or remove the 
connector. After fixing the connector, it may be necessary to restart this 
worker to release any consumed resources.
Apr 22, 2024 @ 15:56:24.110 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Closing the 
Kafka producer with timeoutMillis = 0 ms.
Apr 22, 2024 @ 15:56:24.110 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Proceeding to 
force close the producer since pending requests could not be completed within 
timeout 0 ms.
Apr 22, 2024 @ 15:56:24.112 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Beginning 
shutdown of Kafka producer I/O thread, sending remaining records.
Apr 22, 2024 @ 15:56:24.112 [Producer 
clientId=connector-producer-d094a5d7bbb046b99d62398cb84d648c-0] Aborting 
incomplete batches due to forced shutdown
Apr 22, 2024 @ 15:56:24.113 
WorkerSourceTaskWorkerSourceTask{id=d094a5d7bbb046b99d62398cb84d648c-0} 
Committing offsets
Apr 22, 2024 @ 15:56:24.113 
WorkerSourceTaskWorkerSourceTask{id=d094a5d7bbb046b99d62398cb84d648c-0} Either 
no records were produced by the task since the last offset commit, or every 
record has been filtered out by a transformation or dropped due to 
transformation or conversion errors.
Apr 22, 2024 @ 15:56:24.146 [Worker clientId=connect-1, 
groupId=pg-group-adf06ea08abb4394ad4f2787481fee17] Finished stopping tasks in 
preparation for rebalance
Apr 22, 2024 @ 15:56:24.165 
WorkerSourceTaskWorkerSourceTask{id=d094a5d7bbb046b99d62398cb84d648c-0} 
Finished