[jira] [Resolved] (KAFKA-9261) NPE when updating client metadata
[ https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-9261. Resolution: Fixed > NPE when updating client metadata > - > > Key: KAFKA-9261 > URL: https://issues.apache.org/jira/browse/KAFKA-9261 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 2.3.2, 2.4.0 > > > We have seen the following exception recently: > {code} > java.lang.NullPointerException > at java.base/java.util.Objects.requireNonNull(Objects.java:221) > at org.apache.kafka.common.Cluster.(Cluster.java:134) > at org.apache.kafka.common.Cluster.(Cluster.java:89) > at > org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120) > at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82) > at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58) > at > org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325) > at org.apache.kafka.clients.Metadata.update(Metadata.java:252) > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > {code} > The client assumes that if a leader is included in the response, then node > information must also be available. There are at least a couple possible > reasons this assumption can fail: > 1. The client is able to detect stale partition metadata using leader epoch > information available. If stale partition metadata is detected, the client > ignores it and uses the last known metadata. However, it cannot detect stale > broker information and will always accept the latest update. This means that > the latest metadata may be a mix of multiple metadata responses and therefore > the invariant will not generally hold. > 2. There is no lock which protects both the fetching of partition metadata > and the live broker when handling a Metadata request. This means an > UpdateMetadata request can arrive concurrently and break the intended > invariant. > It seems case 2 has been possible for a long time, but it should be extremely > rare. Case 1 was only made possible with KIP-320, which added the leader > epoch tracking. It should also be rare, but the window for inconsistent > metadata is probably a bit bigger than the window for a concurrent update. > To fix this, we should make the client more defensive about metadata updates > and not assume that the leader is among the live endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9261) NPE when updating client metadata
[ https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-9261. -- Resolution: Fixed > NPE when updating client metadata > - > > Key: KAFKA-9261 > URL: https://issues.apache.org/jira/browse/KAFKA-9261 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 2.4.0, 2.3.2 > > > We have seen the following exception recently: > {code} > java.lang.NullPointerException > at java.base/java.util.Objects.requireNonNull(Objects.java:221) > at org.apache.kafka.common.Cluster.(Cluster.java:134) > at org.apache.kafka.common.Cluster.(Cluster.java:89) > at > org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120) > at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82) > at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58) > at > org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325) > at org.apache.kafka.clients.Metadata.update(Metadata.java:252) > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > {code} > The client assumes that if a leader is included in the response, then node > information must also be available. There are at least a couple possible > reasons this assumption can fail: > 1. The client is able to detect stale partition metadata using leader epoch > information available. If stale partition metadata is detected, the client > ignores it and uses the last known metadata. However, it cannot detect stale > broker information and will always accept the latest update. This means that > the latest metadata may be a mix of multiple metadata responses and therefore > the invariant will not generally hold. > 2. There is no lock which protects both the fetching of partition metadata > and the live broker when handling a Metadata request. This means an > UpdateMetadata request can arrive concurrently and break the intended > invariant. > It seems case 2 has been possible for a long time, but it should be extremely > rare. Case 1 was only made possible with KIP-320, which added the leader > epoch tracking. It should also be rare, but the window for inconsistent > metadata is probably a bit bigger than the window for a concurrent update. > To fix this, we should make the client more defensive about metadata updates > and not assume that the leader is among the live endpoints. -- This message was sent by Atlassian Jira (v8.3.4#803005)