[jira] [Resolved] (KAFKA-16445) PATCH method for connector configuration
[ https://issues.apache.org/jira/browse/KAFKA-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Yurchenko resolved KAFKA-16445. Resolution: Fixed > PATCH method for connector configuration > > > Key: KAFKA-16445 > URL: https://issues.apache.org/jira/browse/KAFKA-16445 > Project: Kafka > Issue Type: Improvement > Components: connect >Reporter: Ivan Yurchenko >Assignee: Ivan Yurchenko >Priority: Minor > > As [KIP-477: Add PATCH method for connector config in Connect REST > API|https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API] > suggests, we should introduce the PATCH method for connector configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16445) PATCH method for connecto configuration
Ivan Yurchenko created KAFKA-16445: -- Summary: PATCH method for connecto configuration Key: KAFKA-16445 URL: https://issues.apache.org/jira/browse/KAFKA-16445 Project: Kafka Issue Type: Improvement Components: connect Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko As [KIP-477: Add PATCH method for connector config in Connect REST API|https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API] suggests, we should introduce the PATCH method for connector configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15931) Cached transaction index gets closed if tiered storage read is interrupted
Ivan Yurchenko created KAFKA-15931: -- Summary: Cached transaction index gets closed if tiered storage read is interrupted Key: KAFKA-15931 URL: https://issues.apache.org/jira/browse/KAFKA-15931 Project: Kafka Issue Type: Bug Components: Tiered-Storage Affects Versions: 3.6.0 Reporter: Ivan Yurchenko This reproduces when reading from remote storage with the default {{fetch.max.wait.ms}} (500) or lower. This error is logged {noformat} [2023-11-29 14:01:01,166] ERROR Error occurred while reading the remote data for topic1-0 (kafka.log.remote.RemoteLogReader) org.apache.kafka.common.KafkaException: Failed read position from the transaction index at org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:235) at org.apache.kafka.storage.internals.log.TransactionIndex.collectAbortedTxns(TransactionIndex.java:171) at kafka.log.remote.RemoteLogManager.collectAbortedTransactions(RemoteLogManager.java:1359) at kafka.log.remote.RemoteLogManager.addAbortedTransactions(RemoteLogManager.java:1341) at kafka.log.remote.RemoteLogManager.read(RemoteLogManager.java:1310) at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:62) at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:31) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.nio.channels.ClosedChannelException at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150) at java.base/sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:325) at org.apache.kafka.storage.internals.log.TransactionIndex$1.hasNext(TransactionIndex.java:233) ... 10 more {noformat} and after that this txn index becomes unusable until the process is restarted. I suspect, it's caused by the reading thread being interrupted due to the fetch timeout. At least [this code|https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c59dfd791f62d41f332db9e306bc1422/src/java.base/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L159-L160] in {{AbstractInterruptibleChannel}} is called. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15107) Additional custom metadata for remote log segment
Ivan Yurchenko created KAFKA-15107: -- Summary: Additional custom metadata for remote log segment Key: KAFKA-15107 URL: https://issues.apache.org/jira/browse/KAFKA-15107 Project: Kafka Issue Type: Improvement Components: core Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko Fix For: 3.6.0 Based on the [KIP-917: Additional custom metadata for remote log segment|https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment], the following needs to be implemented: # {{{}{{RemoteLogSegmentMetadata}}{}}}{{{}.{{{}CustomMetadata{}}}{}}}. # {{RemoteStorageManager.copyLogSegmentData}} needs to be updated to the new return type (+ javadoc). # {{RemoteLogSegmentMetadata.customMetadata}} and {{RemoteLogSegmentMetadata.{{{}createWithCustomMetadata{} methods. Same in {{{}RemoteLogSegmentMetadataSnapshot{}}}. # {{RemoteLogSegmentMetadataRecord}} and {{RemoteLogSegmentMetadataSnapshotRecord}} definitions need to be updated. # Custom metadata should be persisted by {{RemoteLogManager}} if provided. # The new config {{remote.log.metadata.custom.metadata.max.size}} needs to be introduced. # The custom metadata size limit must be applied according to the KIP. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14795) Provide message formatter for RemoteLogMetadata
Ivan Yurchenko created KAFKA-14795: -- Summary: Provide message formatter for RemoteLogMetadata Key: KAFKA-14795 URL: https://issues.apache.org/jira/browse/KAFKA-14795 Project: Kafka Issue Type: Improvement Components: tools Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko For the debugging purpose, it'd be convenient to have a message formatter that can display the content of the {{__remote_log_metadata}} topic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-13376) Allow MirrorMaker producer and consumer customization per replication flow
Ivan Yurchenko created KAFKA-13376: -- Summary: Allow MirrorMaker producer and consumer customization per replication flow Key: KAFKA-13376 URL: https://issues.apache.org/jira/browse/KAFKA-13376 Project: Kafka Issue Type: Bug Components: mirrormaker Reporter: Ivan Yurchenko Currently, it's possible to set producer and consumer configurations for a cluster in MirrorMaker, like this: {noformat} {source}.consumer.{consumer_config_name} {target}.producer.{producer_config_name} {noformat} However, in some cases it makes sense to set these configs differently for different replication flows (e.g. when they have different latency/throughput trade-offs), something like: {noformat} {source}->{target}.{source}.consumer.{consumer_config_name} {source}->{target}.{target}.producer.{producer_config_name} {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12835) Topic IDs can mismatch on brokers (after interbroker protocol version update)
Ivan Yurchenko created KAFKA-12835: -- Summary: Topic IDs can mismatch on brokers (after interbroker protocol version update) Key: KAFKA-12835 URL: https://issues.apache.org/jira/browse/KAFKA-12835 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.8.0 Reporter: Ivan Yurchenko We had a Kafka cluster running 2.8 version with interbroker protocol set to 2.7. It had a number of topics and everything was fine. Then we decided to update the interbroker protocol to 2.8 by the following procedure: 1. Run new brokers with the interbroker protocol set to 2.8. 2. Move the data from the old brokers to the new ones (normal partition reassignment API). 3. Decommission the old brokers. At the stage 2 we had the problem: old brokers started failing on {{LeaderAndIsrRequest}} handling with {code:java} ERROR [Broker id=<...>] Topic Id in memory: <...> does not match the topic Id for partition <...> provided in the request: <...>. (state.change.logger) {code} for multiple topics. Topics were not recreated. We checked {{partition.metadata}} files and IDs there were indeed different from the values in ZooKeeper. It was fixed by deleting the metadata files (and letting them be recreated). The logs, unfortunately, didn't show anything that might point to the cause of the issue (or it happened longer ago than we store the logs). We tried to reproduce this also, but no success. If the community can point out what to check or beware of in future, it will be great. We'll be happy to provide additional information if needed. Thank you! Sorry for the ticket that might be not very actionable. We hope to at least rise awareness of this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12430) emit.heartbeats.enabled = false should disable heartbeats topic creation
Ivan Yurchenko created KAFKA-12430: -- Summary: emit.heartbeats.enabled = false should disable heartbeats topic creation Key: KAFKA-12430 URL: https://issues.apache.org/jira/browse/KAFKA-12430 Project: Kafka Issue Type: Bug Components: mirrormaker Reporter: Ivan Yurchenko Currently, MirrorMaker 2's {{MirrorHeartbeatConnector}} emits heartbeats or not based on {{emit.heartbeats.enabled}} setting. However, {{heartbeats}} topic is created unconditionally. It seems that the same setting should really disable the topic creation as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12235) ZkAdminManager.describeConfigs returns no config when 2+ configuration keys are specified
Ivan Yurchenko created KAFKA-12235: -- Summary: ZkAdminManager.describeConfigs returns no config when 2+ configuration keys are specified Key: KAFKA-12235 URL: https://issues.apache.org/jira/browse/KAFKA-12235 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.7.0 Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko When {{ZkAdminManager.describeConfigs}} receives {{DescribeConfigsResource}} with 2 or more {{configurationKeys}} specified, it returns an empty configuration. Here's a test for {{ZkAdminManagerTest}} that reproduces this issue: {code:scala} @Test def testDescribeConfigsWithConfigurationKeys(): Unit = { EasyMock.expect(zkClient.getEntityConfigs(ConfigType.Topic, topic)).andReturn(TestUtils.createBrokerConfig(brokerId, "zk")) EasyMock.expect(metadataCache.contains(topic)).andReturn(true) EasyMock.replay(zkClient, metadataCache) val resources = List(new DescribeConfigsRequestData.DescribeConfigsResource() .setResourceName(topic) .setResourceType(ConfigResource.Type.TOPIC.id) .setConfigurationKeys(List("retention.ms", "retention.bytes", "segment.bytes").asJava) ) val adminManager = createAdminManager() val results: List[DescribeConfigsResponseData.DescribeConfigsResult] = adminManager.describeConfigs(resources, true, true) assertEquals(Errors.NONE.code, results.head.errorCode()) val resultConfigKeys = results.head.configs().asScala.map(r => r.name()).toSet assertEquals(Set("retention.ms", "retention.bytes", "segment.bytes"), resultConfigKeys) } {code} Works fine with one configuration key, though. The patch is following shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9672) Dead broker in ISR cause isr-expiration to fail with exception
Ivan Yurchenko created KAFKA-9672: - Summary: Dead broker in ISR cause isr-expiration to fail with exception Key: KAFKA-9672 URL: https://issues.apache.org/jira/browse/KAFKA-9672 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.4.0, 2.4.1 Reporter: Ivan Yurchenko We're running Kafka 2.4 and facing a pretty strange situation. Let's say there were three brokers in the cluster 0, 1, and 2. Then: 1. Broker 3 was added. 2. Partitions were reassigned from broker 0 to broker 3. 3. Broker 0 was shut down (not gracefully) and removed from the cluster. 4. We see the following state in ZooKeeper: {code:java} ls /brokers/ids [1, 2, 3] get /brokers/topics/foo {"version":2,"partitions":{"0":[2,1,3]},"adding_replicas":{},"removing_replicas":{}} get /brokers/topics/foo/partitions/0/state {"controller_epoch":123,"leader":1,"version":1,"leader_epoch":42,"isr":[0,2,3,1]} {code} It means, the dead broker 0 remains in the partitions's ISR. A big share of the partitions in the cluster have this issue. This is actually causing an errors: {code:java} Uncaught exception in scheduled task 'isr-expiration' (kafka.utils.KafkaScheduler) org.apache.kafka.common.errors.ReplicaNotAvailableException: Replica with id 12 is not available on broker 17 {code} It means that effectively {{isr-expiration}} task is not working any more. I have a suspicion that this was introduced by [this commit (line selected)|https://github.com/apache/kafka/commit/57baa4079d9fc14103411f790b9a025c9f2146a4#diff-5450baca03f57b9f2030f93a480e6969R856] Unfortunately, I haven't been able to reproduce this in isolation. Any hints about how to reproduce (so I can write a patch) or mitigate the issue on a running cluster are welcome. Generally, I assume that not throwing {{ReplicaNotAvailableException}} on a dead (i.e. non-existent) broker, considering them out-of-sync and removing from the ISR should fix the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9478) Controller may stop react on partition reassignment command in ZooKeeper
Ivan Yurchenko created KAFKA-9478: - Summary: Controller may stop react on partition reassignment command in ZooKeeper Key: KAFKA-9478 URL: https://issues.apache.org/jira/browse/KAFKA-9478 Project: Kafka Issue Type: Bug Components: controller, core Affects Versions: 2.4.0, 2.4.1 Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko Seemingly after [bdf2446ccce592f3c000290f11de88520327aa19|https://github.com/apache/kafka/commit/bdf2446ccce592f3c000290f11de88520327aa19], the controller may stop watching {{/admin/reassign_partitions}} node in ZooKeeper and consequently accept partition reassignment commands via ZooKeeper. I'm not 100% sure that bdf2446ccce592f3c000290f11de88520327aa19 causes this, but it doesn't reproduce on [3fe6b5e951db8f7184a4098f8ad8a1afb2b2c1a0|https://github.com/apache/kafka/commit/3fe6b5e951db8f7184a4098f8ad8a1afb2b2c1a0] - the one right before it. Also, reproduces on the trunk HEAD [a87decb9e4df5bfa092c26ae4346f65c426f1321|https://github.com/apache/kafka/commit/a87decb9e4df5bfa092c26ae4346f65c426f1321]. h1. How to reproduce 1. Run ZooKeeper and two Kafka brokers. 2. Create a topic with 100 partitions and place them on Broker 0: {code:bash} distro/bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093 --create \ --topic foo \ --replica-assignment $(for i in {0..99}; do echo -n "0,"; done | sed 's/.$$//') {code} 3. Add some data: {code:bash} seq 1 100 | bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093 --topic foo {code} 4. Create the partition reassignment node {{/admin/reassign_partitions}} in Zoo and shortly after that update the data in the node (even the same value will do). I made a simple Python script for this: {code:python} import time import json from kazoo.client import KazooClient zk = KazooClient(hosts='127.0.0.1:2181') zk.start() reassign = { "version": 1, "partitions":[] } for p in range(100): reassign["partitions"].append({"topic": "foo", "partition": p, "replicas": [1]}) zk.create("/admin/reassign_partitions", json.dumps(reassign).encode()) time.sleep(0.05) zk.set("/admin/reassign_partitions", json.dumps(reassign).encode()) {code} 4. Observe that the controller doesn't react on further updates to {{/admin/reassign_partitions}} and doesn't delete the node. Also, it can be confirmed with {code:bash} echo wchc | nc 127.0.0.1 2181 {code} that there is no watch on the node in ZooKeeper (for this, you should run ZooKeeper with {{4lw.commands.whitelist=*}}). Since it's about timing, it might not work on first attempt, so you might need to do 4 a couple of times. However, the reproducibility rate is pretty high. The data in the topic and the big amount of partitions are not needed per se, only to make the timing more favourable. Controller re-election will solve the issue, but a new controller can be put in this state the same way. h1. Proposed solution TBD, suggestions are welcome. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9143) DistributedHerder misleadingly log error on connector task reconfiguration
Ivan Yurchenko created KAFKA-9143: - Summary: DistributedHerder misleadingly log error on connector task reconfiguration Key: KAFKA-9143 URL: https://issues.apache.org/jira/browse/KAFKA-9143 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko In {{DistributedHerder}} in {{reconfigureConnectorTasksWithRetry}} method there's a [callback|https://github.com/apache/kafka/blob/c552c06aed50b4d4d9a85f73ccc89bc06fa7e094/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1247]: {code:java} @Override public void onCompletion(Throwable error, Void result) { log.error("Unexpected error during connector task reconfiguration: ", error); log.error("Task reconfiguration for {} failed unexpectedly, this connector will not be properly reconfigured unless manually triggered.", connName); } {code} It an error message even when the operation succeeded (i.e., {{error}} is {{null}}). It should include {{if (error != null)}} condition, like in the same class [in another method|https://github.com/apache/kafka/blob/c552c06aed50b4d4d9a85f73ccc89bc06fa7e094/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L792]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9035) Improve
Ivan Yurchenko created KAFKA-9035: - Summary: Improve Key: KAFKA-9035 URL: https://issues.apache.org/jira/browse/KAFKA-9035 Project: Kafka Issue Type: Improvement Components: KafkaConnect Affects Versions: 2.2.1, 2.3.0, 2.1.1, 2.2.0, 2.1.0, 2.0.1, 2.0.0 Reporter: Ivan Yurchenko Assignee: Ivan Yurchenko Kafka Connect workers in the distributed mode can be set up so that they have the same advertised URL (e.g. {{[http://127.0.0.1:8083|http://127.0.0.1:8083/]}}). When a request (e.g., for connector creation) lands on a worker that is not the leader, it will be forwarded to the leader's advertised URL. However, if advertised URLs are all the same, it might never reach the leader (due to the limited number of forwards). (See https://issues.apache.org/jira/browse/KAFKA-7121 for an example.) I propose to address this by detecting such a situation and warning the user on the log. -- This message was sent by Atlassian Jira (v8.3.4#803005)