[jira] [Resolved] (KAFKA-9261) NPE when updating client metadata

2020-02-05 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9261.

Resolution: Fixed

> NPE when updating client metadata
> -
>
> Key: KAFKA-9261
> URL: https://issues.apache.org/jira/browse/KAFKA-9261
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.2, 2.4.0
>
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>   at org.apache.kafka.common.Cluster.(Cluster.java:134)
>   at org.apache.kafka.common.Cluster.(Cluster.java:89)
>   at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58)
>   at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>   at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>   at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
> {code}
> The client assumes that if a leader is included in the response, then node 
> information must also be available. There are at least a couple possible 
> reasons this assumption can fail:
> 1. The client is able to detect stale partition metadata using leader epoch 
> information available. If stale partition metadata is detected, the client 
> ignores it and uses the last known metadata. However, it cannot detect stale 
> broker information and will always accept the latest update. This means that 
> the latest metadata may be a mix of multiple metadata responses and therefore 
> the invariant will not generally hold.
> 2. There is no lock which protects both the fetching of partition metadata 
> and the live broker when handling a Metadata request. This means an 
> UpdateMetadata request can arrive concurrently and break the intended 
> invariant.
> It seems case 2 has been possible for a long time, but it should be extremely 
> rare. Case 1 was only made possible with KIP-320, which added the leader 
> epoch tracking. It should also be rare, but the window for inconsistent 
> metadata is probably a bit bigger than the window for a concurrent update.
> To fix this, we should make the client more defensive about metadata updates 
> and not assume that the leader is among the live endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9261) NPE when updating client metadata

2019-12-09 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-9261.
--
Resolution: Fixed

> NPE when updating client metadata
> -
>
> Key: KAFKA-9261
> URL: https://issues.apache.org/jira/browse/KAFKA-9261
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.4.0, 2.3.2
>
>
> We have seen the following exception recently:
> {code}
> java.lang.NullPointerException
>   at java.base/java.util.Objects.requireNonNull(Objects.java:221)
>   at org.apache.kafka.common.Cluster.(Cluster.java:134)
>   at org.apache.kafka.common.Cluster.(Cluster.java:89)
>   at 
> org.apache.kafka.clients.MetadataCache.computeClusterView(MetadataCache.java:120)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:82)
>   at org.apache.kafka.clients.MetadataCache.(MetadataCache.java:58)
>   at 
> org.apache.kafka.clients.Metadata.handleMetadataResponse(Metadata.java:325)
>   at org.apache.kafka.clients.Metadata.update(Metadata.java:252)
>   at 
> org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.handleCompletedMetadataResponse(NetworkClient.java:1059)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:845)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:548)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1281)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
> {code}
> The client assumes that if a leader is included in the response, then node 
> information must also be available. There are at least a couple possible 
> reasons this assumption can fail:
> 1. The client is able to detect stale partition metadata using leader epoch 
> information available. If stale partition metadata is detected, the client 
> ignores it and uses the last known metadata. However, it cannot detect stale 
> broker information and will always accept the latest update. This means that 
> the latest metadata may be a mix of multiple metadata responses and therefore 
> the invariant will not generally hold.
> 2. There is no lock which protects both the fetching of partition metadata 
> and the live broker when handling a Metadata request. This means an 
> UpdateMetadata request can arrive concurrently and break the intended 
> invariant.
> It seems case 2 has been possible for a long time, but it should be extremely 
> rare. Case 1 was only made possible with KIP-320, which added the leader 
> epoch tracking. It should also be rare, but the window for inconsistent 
> metadata is probably a bit bigger than the window for a concurrent update.
> To fix this, we should make the client more defensive about metadata updates 
> and not assume that the leader is among the live endpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)