[jira] [Created] (KAFKA-16179) NPE handle ApiVersions during controller failover

2024-01-19 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-16179:
---

 Summary: NPE handle ApiVersions during controller failover
 Key: KAFKA-16179
 URL: https://issues.apache.org/jira/browse/KAFKA-16179
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


{code:java}
java.lang.NullPointerException: Cannot invoke 
"org.apache.kafka.metadata.publisher.FeaturesPublisher.features()" because the 
return value of "kafka.server.ControllerServer.featuresPublisher()" is null
at 
kafka.server.ControllerServer.$anonfun$startup$12(ControllerServer.scala:242)
at 
kafka.server.SimpleApiVersionManager.features(ApiVersionManager.scala:122)
at 
kafka.server.SimpleApiVersionManager.apiVersionResponse(ApiVersionManager.scala:111)
at 
kafka.server.ControllerApis.$anonfun$handleApiVersionsRequest$3(ControllerApis.scala:566)
at 
kafka.server.ControllerApis.$anonfun$handleApiVersionsRequest$3$adapted(ControllerApis.scala:566
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16012) Incomplete range assignment in consumer

2023-12-13 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-16012:
---

 Summary: Incomplete range assignment in consumer
 Key: KAFKA-16012
 URL: https://issues.apache.org/jira/browse/KAFKA-16012
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
 Fix For: 3.7.0


We were looking into test failures here: 
https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1702475525--jolshan--kafka-15784--7cad567675/2023-12-13--001./2023-12-13–001./report.html.
 

Here is the first failure in the report:
{code:java}

test_id:    
kafkatest.tests.core.group_mode_transactions_test.GroupModeTransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers
status:     FAIL
run time:   3 minutes 4.950 seconds


    TimeoutError('Consumer consumed only 88223 out of 10 messages in 90s') 
{code}
 

We traced the failure to an apparent bug during the last rebalance before the 
group became empty. The last remaining instance seems to receive an incomplete 
assignment which prevents it from completing expected consumption on some 
partitions. Here is the rebalance from the coordinator's perspective:
{code:java}
server.log.2023-12-13-04:[2023-12-13 04:58:56,987] INFO [GroupCoordinator 3]: 
Stabilized group grouped-transactions-test-consumer-group generation 5 
(__consumer_offsets-2) with 1 members (kafka.coordinator.group.GroupCoordinator)
server.log.2023-12-13-04:[2023-12-13 04:58:56,990] INFO [GroupCoordinator 3]: 
Assignment received from leader 
consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
 for group grouped-transactions-test-consumer-group for generation 5. The group 
has 1 members, 0 of which are static. 
(kafka.coordinator.group.GroupCoordinator) {code}
The group is down to one member in generation 5. In the previous generation, 
the consumer in question reported this assignment:
{code:java}
// Gen 4: we've got partitions 0-4
[2023-12-13 04:58:52,631] DEBUG [Consumer 
clientId=consumer-grouped-transactions-test-consumer-group-1, 
groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with 
generation 4 and memberId 
consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2023-12-13 04:58:52,631] INFO [Consumer 
clientId=consumer-grouped-transactions-test-consumer-group-1, 
groupId=grouped-transactions-test-consumer-group] Notifying assignor about the 
new Assignment(partitions=[input-topic-0, input-topic-1, input-topic-2, 
input-topic-3, input-topic-4]) 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code}
However, in generation 5, we seem to be assigned only one partition:
{code:java}
// Gen 5: Now we have only partition 1? But aren't we the last member in the 
group?
[2023-12-13 04:58:56,954] DEBUG [Consumer 
clientId=consumer-grouped-transactions-test-consumer-group-1, 
groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with 
generation 5 and memberId 
consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2023-12-13 04:58:56,955] INFO [Consumer 
clientId=consumer-grouped-transactions-test-consumer-group-1, 
groupId=grouped-transactions-test-consumer-group] Notifying assignor about the 
new Assignment(partitions=[input-topic-1]) 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code}
The assignment type is range from the JoinGroup for generation 5. The decoded 
metadata sent by the consumer is this:
{code:java}
Subscription(topics=[input-topic], ownedPartitions=[], groupInstanceId=null, 
generationId=4, rackId=null) {code}
Here is the decoded assignment from the SyncGroup:
{code:java}
Assignment(partitions=[input-topic-1]) {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15828) Protect clients from broker hostname reuse

2023-11-14 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-15828:
---

 Summary: Protect clients from broker hostname reuse
 Key: KAFKA-15828
 URL: https://issues.apache.org/jira/browse/KAFKA-15828
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In some environments such as k8s, brokers may be assigned to nodes dynamically 
from an available pool. When a cluster is rolling, it is possible for the 
client to see the same node advertised for different broker IDs in a short 
period of time. For example, kafka-1 might be initially assigned to node1. 
Before the client is able to establish a connection, it could be that kafka-3 
is now on node1 instead. Currently there is no protection in the client or in 
the protocol for this scenario. If the connection succeeds, the client will 
assume it has a good connection to kafka-1. Until something disrupts the 
connection, it will continue under this assumption even if the hostname for 
kafka-1 changes.

We have observed this scenario in practice. The client connected to the wrong 
broker through stale hostname information. It was unable to produce data 
because of persistent NOT_LEADER errors. The only way to recover in the end was 
by restarting the client to force a reconnection.

We have discussed a couple potential solutions to this problem:
 # Let the client be smarter managing the connection/hostname mapping. When it 
detects that a hostname has changed, it should force a disconnect to ensure it 
connects to the right node.
 # We can modify the protocol to verify that the client has connected to the 
intended broker. For example, we can add a field to ApiVersions to indicate the 
intended broker ID. The broker receiving the request can return an error if its 
ID does not match that in the request.

Are there alternatives? 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15221) Potential race condition between requests from rebooted followers

2023-10-11 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-15221.
-
Fix Version/s: (was: 3.5.2)
   Resolution: Fixed

> Potential race condition between requests from rebooted followers
> -
>
> Key: KAFKA-15221
> URL: https://issues.apache.org/jira/browse/KAFKA-15221
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Calvin Liu
>Assignee: Calvin Liu
>Priority: Blocker
> Fix For: 3.7.0
>
>
> When the leader processes the fetch request, it does not acquire locks when 
> updating the replica fetch state. Then there can be a race between the fetch 
> requests from a rebooted follower.
> T0, broker 1 sends a fetch to broker 0(leader). At the moment, broker 1 is 
> not in ISR.
> T1, broker 1 crashes.
> T2 broker 1 is back online and receives a new broker epoch. Also, it sends a 
> new Fetch request.
> T3 broker 0 receives the old fetch requests and decides to expand the ISR.
> T4 Right before broker 0 starts to fill the AlterPartitoin request, the new 
> fetch request comes in and overwrites the fetch state. Then broker 0 uses the 
> new broker epoch on the AlterPartition request.
> In this way, the AlterPartition request can get around KIP-903 and wrongly 
> update the ISR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14694) RPCProducerIdManager should not wait for a new block

2023-06-22 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14694.
-
Fix Version/s: 3.6.0
   Resolution: Fixed

> RPCProducerIdManager should not wait for a new block
> 
>
> Key: KAFKA-14694
> URL: https://issues.apache.org/jira/browse/KAFKA-14694
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.6.0
>
>
> RPCProducerIdManager initiates an async request to the controller to grab a 
> block of producer IDs and then blocks waiting for a response from the 
> controller.
> This is done in the request handler threads while holding a global lock. This 
> means that if many producers are requesting producer IDs and the controller 
> is slow to respond, many threads can get stuck waiting for the lock.
> This may also be a deadlock concern under the following scenario:
> if the controller has 1 request handler thread (1 chosen for simplicity) and 
> receives an InitProducerId request, it may deadlock.
> basically any time the controller has N InitProducerId requests where N >= # 
> of request handler threads has the potential to deadlock.
> consider this:
> 1. the request handler thread tries to handle an InitProducerId request to 
> the controller by forwarding an AllocateProducerIds request.
> 2. the request handler thread then waits on the controller response (timed 
> poll on nextProducerIdBlock)
> 3. the controller's request handler threads need to pick this request up, and 
> handle it, but the controller's request handler threads are blocked waiting 
> for the forwarded AllocateProducerIds response.
>  
> We should not block while waiting for a new block and instead return 
> immediately to free the request handler threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14831) Illegal state errors should be fatal in transactional producer

2023-03-21 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14831:
---

 Summary: Illegal state errors should be fatal in transactional 
producer
 Key: KAFKA-14831
 URL: https://issues.apache.org/jira/browse/KAFKA-14831
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In KAFKA-14830, the producer hit an illegal state error. The error was 
propagated to the `Sender` thread and logged, but the producer otherwise 
continued on. It would be better to make illegal state errors fatal since 
continuing to write to transactions when the internal state is inconsistent may 
cause incorrect and unpredictable behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14830) Illegal state error in transactional producer

2023-03-21 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14830:
---

 Summary: Illegal state error in transactional producer
 Key: KAFKA-14830
 URL: https://issues.apache.org/jira/browse/KAFKA-14830
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Jason Gustafson


We have seen the following illegal state error in the producer:
{code:java}
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.TimeoutException: 
Expiring 1 record(s) for topic-0:120027 ms has passed since batch creation
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.TimeoutException: 
Expiring 1 record(s) for topic-1:120026 ms has passed since batch creation
[Producer clientId=client-id2, transactionalId=transactional-id] Aborting 
incomplete transaction
[Producer clientId=client-id2, transactionalId=transactional-id] Invoking 
InitProducerId with current producer ID and epoch 
ProducerIdAndEpoch(producerId=191799, epoch=0) in order to bump the epoch
[Producer clientId=client-id2, transactionalId=transactional-id] ProducerId set 
to 191799 with epoch 1
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.NetworkException: 
Disconnected from node 4
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.TimeoutException: 
The request timed out.
[Producer clientId=client-id2, transactionalId=transactional-id] Uncaught error 
in request completion:
java.lang.IllegalStateException: TransactionalId transactional-id: Invalid 
transition attempted from state READY to state ABORTABLE_ERROR
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:1089)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:508)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:734)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:739)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:753)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695)
        at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634)
        at 
org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
        at 
org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562)
        at 
org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
        at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
        at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575)
        at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
        at java.base/java.lang.Thread.run(Thread.java:829)
 {code}
The producer hits timeouts which cause it to abort an active transaction. After 
aborting, the producer bumps its epoch, which transitions it back to the 
`READY` state. Following this, there are two errors for inflight requests, 
which cause an illegal state transition to `ABORTABLE_ERROR`. But how could the 
transaction ABORT complete if there were still inflight requests? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14664) Raft idle ratio is inaccurate

2023-02-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14664.
-
Resolution: Fixed

> Raft idle ratio is inaccurate
> -
>
> Key: KAFKA-14664
> URL: https://issues.apache.org/jira/browse/KAFKA-14664
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.5.0
>
>
> The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO 
> thread is. When completely idle, it should measure 1. When saturated, it 
> should measure 0. The problem with the current measurements is that they are 
> treated equally with respect to time. For example, say we poll twice with the 
> following durations:
> Poll 1: 2s
> Poll 2: 0s
> Assume that the busy time is negligible, so 2s passes overall.
> In the first measurement, 2s is spent waiting, so we compute and record a 
> ratio of 1.0. In the second measurement, no time passes, and we record 0.0. 
> The idle ratio is then computed as the average of these two values (1.0 + 0.0 
> / 2 = 0.5), which suggests that the process was busy for 1s, which 
> overestimates the true busy time.
> Instead, we should sum up the time waiting over the full interval. 2s passes 
> total here and 2s is idle, so we should compute 1.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6793) Unnecessary warning log message

2023-02-13 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-6793.

Fix Version/s: 3.5.0
   Resolution: Fixed

We resolved this by changing the logging to info level. It may still be useful 
in some cases, but most of the time it the "warning" is spurious.

> Unnecessary warning log message 
> 
>
> Key: KAFKA-6793
> URL: https://issues.apache.org/jira/browse/KAFKA-6793
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Anna O
>Assignee: Philip Nee
>Priority: Minor
> Fix For: 3.5.0
>
>
> When upgraded KafkaStreams from 0.11.0.2 to 1.1.0 the following warning log 
> started to appear:
> level: WARN
> logger: org.apache.kafka.clients.consumer.ConsumerConfig
> message: The configuration 'admin.retries' was supplied but isn't a known 
> config.
> The config is not explicitly supplied to the streams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13972) Reassignment cancellation causes stray replicas

2023-02-06 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13972.
-
Resolution: Fixed

> Reassignment cancellation causes stray replicas
> ---
>
> Key: KAFKA-13972
> URL: https://issues.apache.org/jira/browse/KAFKA-13972
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.4.1
>
>
> A stray replica is one that is left behind on a broker after the partition 
> has been reassigned to other brokers or the partition has been deleted. We 
> found one case where this can happen is after a cancelled reassignment. When 
> a reassignment is cancelled, the controller sends `StopReplica` requests to 
> any of the adding replicas, but it does not necessarily bump the leader 
> epoch. Following 
> [KIP-570|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest],]
>  brokers will ignore `StopReplica` requests if the leader epoch matches the 
> current partition leader epoch. So we need to bump the epoch whenever we need 
> to ensure that `StopReplica` will be received.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14672) Producer queue time does not reflect batches expired in the accumulator

2023-02-01 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14672:
---

 Summary: Producer queue time does not reflect batches expired in 
the accumulator
 Key: KAFKA-14672
 URL: https://issues.apache.org/jira/browse/KAFKA-14672
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


The producer exposes two metrics for the time a record has spent in the 
accumulator waiting to be drained:
 * `record-queue-time-avg`
 * `record-queue-time-max`

The metric is only updated when a batch is drained for sending to the broker. 
It is also possible for a batch to be expired before it can be drained, but in 
this case, the metric is not updated. This seems surprising and makes the queue 
time misleading. The only metric I could find that does reflect batch 
expirations in the accumulator is the generic `record-error-rate`. It would 
make sense to let the queue-time metrics record the time spent in the queue 
regardless of the outcome of the record send attempt.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14664) Raft idle ratio is inaccurate

2023-01-30 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14664:
---

 Summary: Raft idle ratio is inaccurate
 Key: KAFKA-14664
 URL: https://issues.apache.org/jira/browse/KAFKA-14664
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


The `poll-idle-ratio-avg` metric is intended to track how idle the raft IO 
thread is. When completely idle, it should measure 1. When saturated, it should 
measure 0. The problem with the current measurements is that they are treated 
equally with respect to time. For example, say we poll twice with the following 
durations:

Poll 1: 2s

Poll 2: 0s

Assume that the busy time is negligible, so 2s passes overall.

In the first measurement, 2s is spent waiting, so we compute and record a ratio 
of 1.0. In the second measurement, no time passes, and we record 0.0. The idle 
ratio is then computed as the average of these two values (1.0 + 0.0 / 2 = 
0.5), which suggests that the process was busy for 1s. 

Instead, we should sum up the time waiting over the full interval. 2s passes 
total here and 2s is idle, so we should compute 1.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14644) Process should stop after failure in raft IO thread

2023-01-25 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14644.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Process should stop after failure in raft IO thread
> ---
>
> Key: KAFKA-14644
> URL: https://issues.apache.org/jira/browse/KAFKA-14644
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
> Fix For: 3.5.0
>
>
> We have seen a few cases where an unexpected error in the Raft IO thread 
> causes the process to enter a zombie state where it is no longer 
> participating in the raft quorum. In this state, a controller can no longer 
> become leader or help in elections, and brokers can no longer update 
> metadata. It may be better to stop the process in this case since there is no 
> way to recover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14648) Do not fail clients if bootstrap servers is not immediately resolvable

2023-01-24 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14648:
---

 Summary: Do not fail clients if bootstrap servers is not 
immediately resolvable
 Key: KAFKA-14648
 URL: https://issues.apache.org/jira/browse/KAFKA-14648
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In dynamic environments, such as system tests, there is sometimes a delay 
between when a client is initialized and when the configured bootstrap servers 
become available in DNS. Currently clients will fail immediately if none of the 
bootstrap servers can resolve. It would be more convenient for these 
environments to provide a grace period to give more time for initialization. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14644) Process should stop after failure in raft IO thread

2023-01-20 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14644:
---

 Summary: Process should stop after failure in raft IO thread
 Key: KAFKA-14644
 URL: https://issues.apache.org/jira/browse/KAFKA-14644
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


We have seen a few cases where an unexpected error in the Raft IO thread causes 
the process to enter a zombie state where it is no longer participating in the 
raft quorum. In this state, a controller can no longer become leader or help in 
elections, and brokers can no longer update metadata. It may be better to stop 
the process in this case since there is no way to recover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14612) Topic config records written to log even when topic creation fails

2023-01-12 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14612.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Topic config records written to log even when topic creation fails
> --
>
> Key: KAFKA-14612
> URL: https://issues.apache.org/jira/browse/KAFKA-14612
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Jason Gustafson
>Assignee: Andrew Grant
>Priority: Major
> Fix For: 3.4.0
>
>
> Config records are added when handling a `CreateTopics` request here: 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L549.]
>  If the subsequent validations fail and the topic is not created, these 
> records will still be written to the log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14618) Off by one error in generated snapshot IDs causes misaligned fetching

2023-01-12 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14618:
---

 Summary: Off by one error in generated snapshot IDs causes 
misaligned fetching
 Key: KAFKA-14618
 URL: https://issues.apache.org/jira/browse/KAFKA-14618
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson
 Fix For: 3.4.0


We implemented new snapshot generation logic here: 
[https://github.com/apache/kafka/pull/12983.] A few days prior to this patch 
getting merged, we had changed the `RaftClient` API to pass the _exclusive_ 
offset when generating snapshots instead of the inclusive offset: 
[https://github.com/apache/kafka/pull/12981.] Unfortunately, the new snapshot 
generation logic was not updated accordingly. The consequence of this is that 
the state on replicas can get out of sync. In the best case, the followers fail 
replication because the offset after loading a snapshot is no longer aligned on 
a batch boundary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14612) Topic config records written to log even when topic creation fails

2023-01-10 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14612:
---

 Summary: Topic config records written to log even when topic 
creation fails
 Key: KAFKA-14612
 URL: https://issues.apache.org/jira/browse/KAFKA-14612
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Jason Gustafson


Config records are added when handling a `CreateTopics` request here: 
[https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L549.]
 If the subsequent validations fail and the topic is not created, these records 
will still be written to the log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14417) Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats as fatal error

2022-12-08 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14417.
-
Fix Version/s: 4.0.0
   3.3.2
   Resolution: Fixed

> Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats 
> as fatal error
> --
>
> Key: KAFKA-14417
> URL: https://issues.apache.org/jira/browse/KAFKA-14417
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.1.0, 3.0.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Blocker
> Fix For: 4.0.0, 3.3.2
>
>
> In TransactionManager we have a handler for InitProducerIdRequests 
> [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14]
> However, we have the potential to return a REQUEST_TIMED_OUT error in 
> RPCProducerIdManager when the BrokerToControllerChannel manager times out: 
> [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236]
>  
> or when the poll returns null: 
> [https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170]
> Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a 
> fatal error and the producer fails. With the default of idempotent producers, 
> this can cause more issues.
> See this stack trace from 3.0:
> {code:java}
> ERROR [Producer clientId=console-producer] Aborting producer batches due to 
> fatal error (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Unexpected error in 
> InitProducerIdResponse; The request timed out.
>   at 
> org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390)
>   at 
> org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294)
>   at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
>   at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Seems like the commit that introduced the changes was this one: 
> [https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958]
>  so we are vulnerable when the server code is ibp 3.0 and beyond.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14397) Idempotent producer may bump epoch and reset sequence numbers prematurely

2022-11-16 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14397:
---

 Summary: Idempotent producer may bump epoch and reset sequence 
numbers prematurely
 Key: KAFKA-14397
 URL: https://issues.apache.org/jira/browse/KAFKA-14397
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Suppose that idempotence is enabled in the producer and we send the following 
single-record batches to a partition leader:
 * A: epoch=0, seq=0
 * B: epoch=0, seq=1
 * C: epoch=0, seq=2

The partition leader receives all 3 of these batches and commits them to the 
log. However, the connection is lost before the `Produce` responses are 
received by the client. Subsequent retries by the producer all fail to be 
delivered.

It is possible in this scenario for the first batch `A` to reach the delivery 
timeout before the subsequence batches. This triggers the following check: 
[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L642.]
 Depending whether retries are exhausted, we may adjust sequence numbers.

The intuition behind this check is that if retries have not been exhausted, 
then we saw a fatal error and the batch could not have been written to the log. 
Hence we should bump the epoch and adjust the sequence numbers of the pending 
batches since they are presumed to be doomed to failure. So in this case, 
batches B and C might get reset with the bumped epoch:
 * B: epoch=1, seq=0
 * C: epoch=1, seq=1

This can result in duplicate records in the log.

The root of the issue is that this logic does not account for expiration of the 
delivery timeout. When the delivery timeout is reached, the number of retries 
is still likely much lower than the max allowed number of retries (which is 
`Int.MaxValue` by default).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13964) kafka-configs.sh end with UnsupportedVersionException when describing TLS user with quotas

2022-11-09 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13964.
-
Resolution: Duplicate

Thanks for reporting the issue. This will be resolved by 
https://issues.apache.org/jira/browse/KAFKA-14084.

> kafka-configs.sh end with UnsupportedVersionException when describing TLS 
> user with quotas 
> ---
>
> Key: KAFKA-13964
> URL: https://issues.apache.org/jira/browse/KAFKA-13964
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, kraft
>Affects Versions: 3.2.0
> Environment: Kafka 3.2.0 running on OpenShift 4.10 in KRaft mode 
> managed by Strimzi
>Reporter: Jakub Stejskal
>Priority: Minor
>
> {color:#424242}Usage of {color:#424242}kafka-configs.sh end with 
> {color:#424242}org.apache.kafka.common.errors.UnsupportedVersionException: 
> The broker does not support DESCRIBE_USER_SCRAM_CREDENTIALS when describing 
> TLS user with quotas enabled. {color}{color}{color}
>  
> {code:java}
> bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe --user 
> CN=encrypted-arnost` got status code 1 and stderr: -- Error while 
> executing config command with args '--bootstrap-server localhost:9092 
> --describe --user CN=encrypted-arnost' 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnsupportedVersionException: The broker does 
> not support DESCRIBE_USER_SCRAM_CREDENTIALS{code}
> STDOUT contains all necessary data, but the script itself ends with return 
> code 1 and the error above. Scram-sha has not been configured anywhere in 
> that case (not supported by KRaft). This might be fixed by adding support for 
> scram-sha in the next version (not reproducible without KRaft enabled).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14316) NoSuchElementException in feature control iterator

2022-10-18 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14316.
-
Fix Version/s: 3.4.0
   3.3.2
   Resolution: Fixed

> NoSuchElementException in feature control iterator
> --
>
> Key: KAFKA-14316
> URL: https://issues.apache.org/jira/browse/KAFKA-14316
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> We noticed this exception during testing:
> {code:java}
> java.util.NoSuchElementException
> 2 at 
> org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:276)
> 3 at 
> org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:189)
> 4 at 
> org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:360)
> 5 at 
> org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:346)
> 6 at 
> org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:375)
> 7 at 
> org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:347)
> 8 at 
> org.apache.kafka.controller.SnapshotGenerator.generateBatch(SnapshotGenerator.java:109)
> 9 at 
> org.apache.kafka.controller.SnapshotGenerator.generateBatches(SnapshotGenerator.java:126)
> 10at 
> org.apache.kafka.controller.QuorumController$SnapshotGeneratorManager.run(QuorumController.java:637)
> 11at 
> org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:539)
> 12at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
> 13at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
> 14at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
> 15at java.base/java.lang.Thread.run(Thread.java:833)
> 16at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:64) 
> {code}
> The iterator `FeatureControlIterator.hasNext()` checks two conditions: 1) 
> whether we have already written the metadata version, and 2) whether the 
> underlying iterator has additional records. However, in `next()`, we also 
> check that the metadata version is at least high enough to include it in the 
> log. When this fails, then we can see an unexpected `NoSuchElementException` 
> if the underlying iterator is empty.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14319) Storage tool format command does not work with old metadata versions

2022-10-18 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14319:
---

 Summary: Storage tool format command does not work with old 
metadata versions
 Key: KAFKA-14319
 URL: https://issues.apache.org/jira/browse/KAFKA-14319
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


When using the format tool with older metadata versions, we see the following 
error:
{code:java}
$ bin/kafka-storage.sh format --cluster-id vVlhxM7VT3C-3nz7yEkiCQ --config 
config/kraft/server.properties --release-version "3.1-IV0" 
Exception in thread "main" java.lang.RuntimeException: Bootstrap metadata 
versions before 3.3-IV0 are not supported. Can't load metadata from format 
command
        at 
org.apache.kafka.metadata.bootstrap.BootstrapMetadata.(BootstrapMetadata.java:83)
        at 
org.apache.kafka.metadata.bootstrap.BootstrapMetadata.fromVersion(BootstrapMetadata.java:48)
        at 
kafka.tools.StorageTool$.$anonfun$formatCommand$2(StorageTool.scala:265)
        at 
kafka.tools.StorageTool$.$anonfun$formatCommand$2$adapted(StorageTool.scala:254)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at kafka.tools.StorageTool$.formatCommand(StorageTool.scala:254)
        at kafka.tools.StorageTool$.main(StorageTool.scala:61)
        at kafka.tools.StorageTool.main(StorageTool.scala) {code}
For versions prior to `3.3-IV0`, we should skip creation of the 
`bootstrap.checkpoint` file instead of failing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14316) NoSuchElementException in feature control iterator

2022-10-18 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14316:
---

 Summary: NoSuchElementException in feature control iterator
 Key: KAFKA-14316
 URL: https://issues.apache.org/jira/browse/KAFKA-14316
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Jason Gustafson
Assignee: Jason Gustafson


We noticed this exception during testing:
{code:java}
java.util.NoSuchElementException
2   at 
org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:276)
3   at 
org.apache.kafka.timeline.SnapshottableHashTable$HistoricalIterator.next(SnapshottableHashTable.java:189)
4   at 
org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:360)
5   at 
org.apache.kafka.timeline.TimelineHashMap$EntryIterator.next(TimelineHashMap.java:346)
6   at 
org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:375)
7   at 
org.apache.kafka.controller.FeatureControlManager$FeatureControlIterator.next(FeatureControlManager.java:347)
8   at 
org.apache.kafka.controller.SnapshotGenerator.generateBatch(SnapshotGenerator.java:109)
9   at 
org.apache.kafka.controller.SnapshotGenerator.generateBatches(SnapshotGenerator.java:126)
10  at 
org.apache.kafka.controller.QuorumController$SnapshotGeneratorManager.run(QuorumController.java:637)
11  at 
org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:539)
12  at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
13  at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
14  at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
15  at java.base/java.lang.Thread.run(Thread.java:833)
16  at org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:64) 
{code}
The iterator `FeatureControlIterator.hasNext()` checks two conditions: 1) 
whether we have already written the metadata version, and 2) whether the 
underlying iterator has additional records. However, in `next()`, we also check 
that the metadata version is at least high enough to include it in the log. 
When this fails, then we can see an unexpected `NoSuchElementException` if the 
underlying iterator is empty.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14296) Partition leaders are not demoted during kraft controlled shutdown

2022-10-13 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14296.
-
Fix Version/s: 3.4.0
   3.3.2
   Resolution: Fixed

> Partition leaders are not demoted during kraft controlled shutdown
> --
>
> Key: KAFKA-14296
> URL: https://issues.apache.org/jira/browse/KAFKA-14296
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> When the BrokerServer starts its shutting down process, it transitions to 
> SHUTTING_DOWN and sets isShuttingDown to true. With this state change, the 
> follower state changes are short-cutted. This means that a broker which was 
> serving as leader would remain acting as a leader until controlled shutdown 
> completes. Instead, we want the leader and ISR state to be updated so that 
> requests will return NOT_LEADER and the client can find the new leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14292) KRaft broker controlled shutdown can be delayed indefinitely

2022-10-13 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14292.
-
Fix Version/s: 3.4.0
   3.3.2
   Resolution: Fixed

> KRaft broker controlled shutdown can be delayed indefinitely
> 
>
> Key: KAFKA-14292
> URL: https://issues.apache.org/jira/browse/KAFKA-14292
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Alyssa Huang
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> We noticed when rolling a kraft cluster that it took an unexpectedly long 
> time for one of the brokers to shutdown. In the logs, we saw the following:
> {code:java}
> Oct 11, 2022 @ 17:53:38.277   [Controller 1] The request from broker 8 to 
> shut down can not yet be granted because the lowest active offset 2283357 is 
> not greater than the broker's shutdown offset 2283358. 
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 2Oct 11, 2022 @ 17:53:38.277  [Controller 1] Updated the controlled shutdown 
> offset for broker 8 to 2283362.  
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 3Oct 11, 2022 @ 17:53:40.278  [Controller 1] Updated the controlled shutdown 
> offset for broker 8 to 2283366.  
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 4Oct 11, 2022 @ 17:53:40.278  [Controller 1] The request from broker 8 to 
> shut down can not yet be granted because the lowest active offset 2283361 is 
> not greater than the broker's shutdown offset 2283362. 
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 5Oct 11, 2022 @ 17:53:42.279  [Controller 1] The request from broker 8 to 
> shut down can not yet be granted because the lowest active offset 2283365 is 
> not greater than the broker's shutdown offset 2283366. 
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 6Oct 11, 2022 @ 17:53:42.279  [Controller 1] Updated the controlled shutdown 
> offset for broker 8 to 2283370.  
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 7Oct 11, 2022 @ 17:53:44.280  [Controller 1] The request from broker 8 to 
> shut down can not yet be granted because the lowest active offset 2283369 is 
> not greater than the broker's shutdown offset 2283370. 
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
> 8Oct 11, 2022 @ 17:53:44.281  [Controller 1] Updated the controlled shutdown 
> offset for broker 8 to 2283374.  
> org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG{code}
> From what I can tell, it looks like the controller waits until all brokers 
> have caught up to the {{controlledShutdownOffset}} of the broker that is 
> shutting down before allowing it to proceed. Probably the intent is to make 
> sure they have all the leader and ISR state.
> The problem is that the {{controlledShutdownOffset}} seems to be updated 
> after every heartbeat that the controller receives: 
> https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L1996.
>  Unless all other brokers can catch up to that offset before the next 
> heartbeat from the shutting down broker is received, then the broker remains 
> in the shutting down state indefinitely.
> In this case, it took more than 40 minutes before the broker completed 
> shutdown:
> {code:java}
> 1Oct 11, 2022 @ 18:36:36.105  [Controller 1] The request from broker 8 to 
> shut down has been granted since the lowest active offset 2288510 is now 
> greater than the broker's controlled shutdown offset 2288510.  
> org.apache.kafka.controller.BrokerHeartbeatManager  INFO
> 2Oct 11, 2022 @ 18:40:35.197  [Controller 1] The request from broker 8 to 
> unfence has been granted because it has caught up with the offset of it's 
> register broker record 2288906.   
> org.apache.kafka.controller.BrokerHeartbeatManager  INFO{code}
> It seems like the bug here is that we should not keep updating 
> {{controlledShutdownOffset}} if it has already been set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14292) KRaft broker controlled shutdown can be delayed indefinitely

2022-10-11 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14292:
---

 Summary: KRaft broker controlled shutdown can be delayed 
indefinitely
 Key: KAFKA-14292
 URL: https://issues.apache.org/jira/browse/KAFKA-14292
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Alyssa Huang


We noticed when rolling a kraft cluster that it took an unexpectedly long time 
for one of the brokers to shutdown. In the logs, we saw the following:
{code:java}
Oct 11, 2022 @ 17:53:38.277 [Controller 1] The request from broker 8 to 
shut down can not yet be granted because the lowest active offset 2283357 is 
not greater than the broker's shutdown offset 2283358. 
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
2Oct 11, 2022 @ 17:53:38.277[Controller 1] Updated the controlled shutdown 
offset for broker 8 to 2283362.  
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
3Oct 11, 2022 @ 17:53:40.278[Controller 1] Updated the controlled shutdown 
offset for broker 8 to 2283366.  
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
4Oct 11, 2022 @ 17:53:40.278[Controller 1] The request from broker 8 to 
shut down can not yet be granted because the lowest active offset 2283361 is 
not greater than the broker's shutdown offset 2283362. 
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
5Oct 11, 2022 @ 17:53:42.279[Controller 1] The request from broker 8 to 
shut down can not yet be granted because the lowest active offset 2283365 is 
not greater than the broker's shutdown offset 2283366. 
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
6Oct 11, 2022 @ 17:53:42.279[Controller 1] Updated the controlled shutdown 
offset for broker 8 to 2283370.  
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
7Oct 11, 2022 @ 17:53:44.280[Controller 1] The request from broker 8 to 
shut down can not yet be granted because the lowest active offset 2283369 is 
not greater than the broker's shutdown offset 2283370. 
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG   
8Oct 11, 2022 @ 17:53:44.281[Controller 1] Updated the controlled shutdown 
offset for broker 8 to 2283374.  
org.apache.kafka.controller.BrokerHeartbeatManager  DEBUG{code}
>From what I can tell, it looks like the controller waits until all brokers 
>have caught up to the {{controlledShutdownOffset}} of the broker that is 
>shutting down before allowing it to proceed. Probably the intent is to make 
>sure they have all the leader and ISR state.

The problem is that the {{controlledShutdownOffset}} seems to be updated after 
every heartbeat that the controller receives: 
https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L1996.
 Unless all other brokers can catch up to that offset before the next heartbeat 
from the shutting down broker is received, then the broker remains in the 
shutting down state indefinitely.

In this case, it took more than 40 minutes before the broker completed shutdown:
{code:java}
1Oct 11, 2022 @ 18:36:36.105[Controller 1] The request from broker 8 to 
shut down has been granted since the lowest active offset 2288510 is now 
greater than the broker's controlled shutdown offset 2288510.  
org.apache.kafka.controller.BrokerHeartbeatManager  INFO
2Oct 11, 2022 @ 18:40:35.197[Controller 1] The request from broker 8 to 
unfence has been granted because it has caught up with the offset of it's 
register broker record 2288906.   
org.apache.kafka.controller.BrokerHeartbeatManager  INFO{code}
It seems like the bug here is that we should not keep updating 
{{controlledShutdownOffset}} if it has already been set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14247) Implement EventHandler interface and DefaultEventHandler

2022-10-03 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14247.
-
Resolution: Fixed

> Implement EventHandler interface and DefaultEventHandler
> 
>
> Key: KAFKA-14247
> URL: https://issues.apache.org/jira/browse/KAFKA-14247
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Major
>
> The polling thread uses events to communicate with the background thread.  
> The events send to the background thread are the {_}Requests{_}, and the 
> events send from the background thread to the polling thread are the 
> {_}Responses{_}.
>  
> Here we have an EventHandler interface and DefaultEventHandler 
> implementation.  The implementation uses two blocking queues to send events 
> both ways.  The two methods, add and poll allows the client, i.e., the 
> polling thread, to retrieve and add events to the handler.
>  
> PR: https://github.com/apache/kafka/pull/12663



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14236) ListGroups request produces too much Denied logs in authorizer

2022-09-21 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14236.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> ListGroups request produces too much Denied logs in authorizer
> --
>
> Key: KAFKA-14236
> URL: https://issues.apache.org/jira/browse/KAFKA-14236
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Alexandre GRIFFAUT
>Priority: Major
>  Labels: patch-available
> Fix For: 3.4.0
>
>
> Context
> On a multi-tenant secured cluster, with many consumers, a call to ListGroups 
> api will log an authorization error for each consumer group of other tenant.
> Reason
> The handleListGroupsRequest function first tries to authorize a DESCRIBE 
> CLUSTER, and if it fails it will then try to authorize a DESCRIBE GROUP on 
> each consumer group.
> Fix
> In that case neither the DESCRIBE CLUSTER, nor the DESCRIBE GROUP of other 
> tenant were intended, and should be specified in the Action using 
> logIfDenied: false



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14240) Ensure kraft metadata log dir is initialized with valid snapshot state

2022-09-19 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14240.
-
Resolution: Fixed

> Ensure kraft metadata log dir is initialized with valid snapshot state
> --
>
> Key: KAFKA-14240
> URL: https://issues.apache.org/jira/browse/KAFKA-14240
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.3.0
>
>
> If the first segment under __cluster_metadata has a base offset greater than 
> 0, then there must exist at least one snapshot which has a larger offset than 
> whatever the first segment starts at. We should check for this at startup to 
> prevent the controller from initialization with invalid state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-17 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14238.
-
Resolution: Fixed

> KRaft replicas can delete segments not included in a snapshot
> -
>
> Key: KAFKA-14238
> URL: https://issues.apache.org/jira/browse/KAFKA-14238
> Project: Kafka
>  Issue Type: Bug
>  Components: core, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We see this in the log
> {code:java}
> Deleting segment LogSegment(baseOffset=243864, size=9269150, 
> lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) 
> due to retention time 60480ms breach based on the largest record 
> timestamp in the segment {code}
> This then cause {{KafkaRaftClient}} to throw an exception when sending 
> batches to the listener:
> {code:java}
>  java.lang.IllegalStateException: Snapshot expected since next offset of 
> org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 
> is 0, log start offset is 369668 and high-watermark is 547379
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
>   at java.base/java.util.Optional.orElseThrow(Optional.java:403)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
>   at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
> The on disk state for the cluster metadata partition confirms this:
> {code:java}
>  ls __cluster_metadata-0/
> 00369668.index
> 00369668.log
> 00369668.timeindex
> 00503411.index
> 00503411.log
> 00503411.snapshot
> 00503411.timeindex
> 00548746.snapshot
> leader-epoch-checkpoint
> partition.metadata
> quorum-state{code}
> Noticed that there are no {{checkpoint}} files and the log doesn't have a 
> segment at base offset 0.
> This is happening because the {{LogConfig}} used for KRaft sets the retention 
> policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
> old segments even if there are no snaspshot for it. For KRaft, Kafka should 
> only delete segment that breach the log start offset.
> Log configuration for KRaft:
> {code:java}
>   val props = new Properties()
>   props.put(LogConfig.MaxMessageBytesProp, 
> config.maxBatchSizeInBytes.toString)
>   props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
>   props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
>   props.put(LogConfig.FileDeleteDelayMsProp, 
> Int.box(Defaults.FileDeleteDelayMs))
>   LogConfig.validateValues(props)
>   val defaultLogConfig = LogConfig(props){code}
> Segment deletion code:
> {code:java}
>  def deleteOldSegments(): Int = {
>   if (config.delete) {
> deleteLogStartOffsetBreachedSegments() +
>   deleteRetentionSizeBreachedSegments() +
>   deleteRetentionMsBreachedSegments()
>   } else {
> deleteLogStartOffsetBreachedSegments()
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14240) Ensure kraft metadata log dir is initialized with valid snapshot state

2022-09-16 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14240:
---

 Summary: Ensure kraft metadata log dir is initialized with valid 
snapshot state
 Key: KAFKA-14240
 URL: https://issues.apache.org/jira/browse/KAFKA-14240
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


If the first segment under __cluster_metadata has a base offset greater than 0, 
then there must exist at least one snapshot which has a larger offset than 
whatever the first segment starts at. We should check for this at startup to 
prevent the controller from initialization with invalid state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14215) KRaft forwarded requests have no quota enforcement

2022-09-12 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14215.
-
Resolution: Fixed

> KRaft forwarded requests have no quota enforcement
> --
>
> Key: KAFKA-14215
> URL: https://issues.apache.org/jira/browse/KAFKA-14215
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0, 3.3
>
>
> On the broker, the `BrokerMetadataPublisher` is responsible for propagating 
> quota changes from `ClientQuota` records to `ClientQuotaManager`. On the 
> controller, there is no similar logic, so no client quotas are enforced on 
> the controller.
> On the broker side, there is no enforcement as well since the broker assumes 
> that the controller will be the one to do it. Basically it looks at the 
> throttle time returned in the response from the controller. If it is 0, then 
> the response is sent immediately without any throttling. 
> So the consequence of both of these issues is that controller-bound requests 
> have no throttling today.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14224) Consumer should auto-commit prior to cooperative partition revocation

2022-09-12 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14224:
---

 Summary: Consumer should auto-commit prior to cooperative 
partition revocation
 Key: KAFKA-14224
 URL: https://issues.apache.org/jira/browse/KAFKA-14224
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


With the old "eager" reassignment logic, we always revoked all partitions prior 
to each rebalance. When auto-commit is enabled, a part of this process is 
committing current position. Under the cooperative logic, we defer revocation 
until after the rebalance, which means we can continue fetching while the 
rebalance is in progress. However, when reviewing KAFKA-14196, we noticed that 
there is no similar logic to commit offsets prior to this deferred revocation. 
This means that cooperative consumption is more likely to lead to have 
duplicate consumption even when there is no failure involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14215) KRaft forwarded requests have no quota enforcement

2022-09-09 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14215:
---

 Summary: KRaft forwarded requests have no quota enforcement
 Key: KAFKA-14215
 URL: https://issues.apache.org/jira/browse/KAFKA-14215
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


On the broker, the `BrokerMetadataPublisher` is responsible for propagating 
quota changes from `ClientQuota` records to `ClientQuotaManager`. On the 
controller, there is no similar logic, so no client quotas are enforced on the 
controller.

On the broker side, there is no enforcement as well since the broker assumes 
that the controller will be the one to do it. Basically it looks at the 
throttle time returned in the response from the controller. If it is 0, then 
the response is sent immediately without any throttling. 

So the consequence of both of these issues is that controller-bound requests 
have no throttling today.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14201) Consumer should not send group instance ID if committing with empty member ID

2022-09-02 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14201:
---

 Summary: Consumer should not send group instance ID if committing 
with empty member ID
 Key: KAFKA-14201
 URL: https://issues.apache.org/jira/browse/KAFKA-14201
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


The consumer group instance ID is used to support a notion of "static" consumer 
groups. The idea is to be able to identify the same group instance across 
restarts so that a rebalance is not needed. However, if the user sets 
`group.instance.id` in the consumer configuration, but uses "simple" assignment 
with `assign()`, then the instance ID nevertheless is sent in the OffsetCommit 
request to the coordinator. This may result in a surprising UNKNOWN_MEMBER_ID 
error. The consumer should probably be smart enough to only send the instance 
ID when committing as part of a consumer group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14177) Correctly support older kraft versions without FeatureLevelRecord

2022-08-25 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14177.
-
Resolution: Fixed

> Correctly support older kraft versions without FeatureLevelRecord
> -
>
> Key: KAFKA-14177
> URL: https://issues.apache.org/jira/browse/KAFKA-14177
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14183) Kraft bootstrap metadata file should use snapshot header/footer

2022-08-25 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14183:
---

 Summary: Kraft bootstrap metadata file should use snapshot 
header/footer
 Key: KAFKA-14183
 URL: https://issues.apache.org/jira/browse/KAFKA-14183
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
 Fix For: 3.3.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13166) EOFException when Controller handles unknown API

2022-08-22 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13166.
-
Resolution: Fixed

> EOFException when Controller handles unknown API
> 
>
> Key: KAFKA-13166
> URL: https://issues.apache.org/jira/browse/KAFKA-13166
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
> Fix For: 3.3.0, 3.4.0
>
>
> When ControllerApis handles an unsupported RPC, it silently drops the request 
> due to an unhandled exception. 
> The following stack trace was manually printed since this exception was 
> suppressed on the controller. 
> {code}
> java.util.NoSuchElementException: key not found: UpdateFeatures
>   at scala.collection.MapOps.default(Map.scala:274)
>   at scala.collection.MapOps.default$(Map.scala:273)
>   at scala.collection.AbstractMap.default(Map.scala:405)
>   at scala.collection.mutable.HashMap.apply(HashMap.scala:425)
>   at kafka.network.RequestChannel$Metrics.apply(RequestChannel.scala:74)
>   at 
> kafka.network.RequestChannel.$anonfun$updateErrorMetrics$1(RequestChannel.scala:458)
>   at 
> kafka.network.RequestChannel.$anonfun$updateErrorMetrics$1$adapted(RequestChannel.scala:457)
>   at 
> kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
>   at 
> scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:359)
>   at 
> scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:355)
>   at 
> scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309)
>   at 
> kafka.network.RequestChannel.updateErrorMetrics(RequestChannel.scala:457)
>   at kafka.network.RequestChannel.sendResponse(RequestChannel.scala:388)
>   at 
> kafka.server.RequestHandlerHelper.sendErrorOrCloseConnection(RequestHandlerHelper.scala:93)
>   at 
> kafka.server.RequestHandlerHelper.sendErrorResponseMaybeThrottle(RequestHandlerHelper.scala:121)
>   at 
> kafka.server.RequestHandlerHelper.handleError(RequestHandlerHelper.scala:78)
>   at kafka.server.ControllerApis.handle(ControllerApis.scala:116)
>   at 
> kafka.server.ControllerApis.$anonfun$handleEnvelopeRequest$1(ControllerApis.scala:125)
>   at 
> kafka.server.ControllerApis.$anonfun$handleEnvelopeRequest$1$adapted(ControllerApis.scala:125)
>   at 
> kafka.server.EnvelopeUtils$.handleEnvelopeRequest(EnvelopeUtils.scala:65)
>   at 
> kafka.server.ControllerApis.handleEnvelopeRequest(ControllerApis.scala:125)
>   at kafka.server.ControllerApis.handle(ControllerApis.scala:103)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This is due to a bug in the metrics code in RequestChannel.
> The result is that the request fails, but no indication is given that it was 
> due to an unsupported API on either the broker, controller, or client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13914) Implement kafka-metadata-quorum.sh

2022-08-20 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13914.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Implement kafka-metadata-quorum.sh
> --
>
> Key: KAFKA-13914
> URL: https://issues.apache.org/jira/browse/KAFKA-13914
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: dengziming
>Priority: Major
> Fix For: 3.3.0
>
>
> KIP-595 documents a tool for describing quorum status 
> `kafka-metadata-quorum.sh`: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-ToolingSupport.]
>   We need to implement this.
> Note that this depends on the Admin API for `DescribeQuorum`, which is 
> proposed in KIP-836: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-836%3A+Addition+of+Information+in+DescribeQuorumResponse+about+Voter+Lag.]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14167) Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller

2022-08-17 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14167.
-
Resolution: Fixed

> Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller
> 
>
> Key: KAFKA-14167
> URL: https://issues.apache.org/jira/browse/KAFKA-14167
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0
>
>
> In `ControllerApis`, we have callbacks such as the following after completion:
> {code:java}
>     controller.allocateProducerIds(context, allocatedProducerIdsRequest.data)
>       .handle[Unit] { (results, exception) =>
>         if (exception != null) {
>           requestHelper.handleError(request, exception)
>         } else {
>           requestHelper.sendResponseMaybeThrottle(request, requestThrottleMs 
> => {
>             results.setThrottleTimeMs(requestThrottleMs)
>             new AllocateProducerIdsResponse(results)
>           })
>         }
>       } {code}
> What I see locally is that the underlying exception that gets passed to 
> `handle` always gets wrapped in a `CompletionException`. When passed to 
> `getErrorResponse`, this error will get converted to `UNKNOWN_SERVER_ERROR`. 
> For example, in this case, a `NOT_CONTROLLER` error returned from the 
> controller would be returned as `UNKNOWN_SERVER_ERROR`. It looks like there 
> are a few APIs that are potentially affected by this bug, such as 
> `DeleteTopics` and `UpdateFeatures`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13940) DescribeQuorum returns INVALID_REQUEST if not handled by leader

2022-08-17 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13940.
-
Resolution: Fixed

> DescribeQuorum returns INVALID_REQUEST if not handled by leader
> ---
>
> Key: KAFKA-13940
> URL: https://issues.apache.org/jira/browse/KAFKA-13940
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.3.0
>
>
> In `KafkaRaftClient.handleDescribeQuorum`, we currently return 
> INVALID_REQUEST if the node is not the current raft leader. This is 
> surprising and doesn't work with our general approach for retrying forwarded 
> APIs. In `BrokerToControllerChannelManager`, we only retry after 
> `NOT_CONTROLLER` errors. It would be more consistent with the other Raft APIs 
> if we returned NOT_LEADER_OR_FOLLOWER, but that also means we need additional 
> logic in `BrokerToControllerChannelManager` to handle that error and retry 
> correctly. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14167) Unexpected UNKNOWN_SERVER_ERROR raised from kraft controller

2022-08-15 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14167:
---

 Summary: Unexpected UNKNOWN_SERVER_ERROR raised from kraft 
controller
 Key: KAFKA-14167
 URL: https://issues.apache.org/jira/browse/KAFKA-14167
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In `ControllerApis`, we have callbacks such as the following after completion:
{code:java}
    controller.allocateProducerIds(context, allocatedProducerIdsRequest.data)
      .handle[Unit] { (results, exception) =>
        if (exception != null) {
          requestHelper.handleError(request, exception)
        } else {
          requestHelper.sendResponseMaybeThrottle(request, requestThrottleMs => 
{
            results.setThrottleTimeMs(requestThrottleMs)
            new AllocateProducerIdsResponse(results)
          })
        }
      } {code}
What I see locally is that the underlying exception that gets passed to 
`handle` always gets wrapped in a `CompletionException`. When passed to 
`getErrorResponse`, this error will get converted to `UNKNOWN_SERVER_ERROR`. 
For example, in this case, a `NOT_CONTROLLER` error returned from the 
controller would be returned as `UNKNOWN_SERVER_ERROR`. It looks like there are 
a few APIs that are potentially affected by this bug, such as `DeleteTopics` 
and `UpdateFeatures`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14154) Persistent URP after controller soft failure

2022-08-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14154.
-
Resolution: Fixed

> Persistent URP after controller soft failure
> 
>
> Key: KAFKA-14154
> URL: https://issues.apache.org/jira/browse/KAFKA-14154
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We ran into a scenario where a partition leader was unable to expand the ISR 
> after a soft controller failover. Here is what happened:
> Initial state: leader=1, isr=[1,2], leader epoch=10. Broker 1 is acting as 
> the current controller.
> 1. Broker 1 loses its session in Zookeeper.  
> 2. Broker 2 becomes the new controller.
> 3. During initialization, controller 2 removes 1 from the ISR. So state is 
> updated: leader=2, isr=[2], leader epoch=11.
> 4. Broker 2 receives `LeaderAndIsr` from the new controller with leader 
> epoch=11.
> 5. Broker 2 immediately tries to add replica 1 back to the ISR since it is 
> still fetching and is caught up. However, the 
> `BrokerToControllerChannelManager` is still pointed at controller 1, so that 
> is where the `AlterPartition` is sent.
> 6. Controller 1 does not yet realize that it is not the controller, so it 
> processes the `AlterPartition` request. It sees the leader epoch of 11, which 
> is higher than what it has in its own context. Following changes to the 
> `AlterPartition` validation in 
> [https://github.com/apache/kafka/pull/12032/files,] the controller returns 
> FENCED_LEADER_EPOCH.
> 7. After receiving the FENCED_LEADER_EPOCH from the old controller, the 
> leader is stuck because it assumes that the error implies that another 
> LeaderAndIsr request should be sent.
> Prior to 
> [https://github.com/apache/kafka/pull/12032/files|https://github.com/apache/kafka/pull/12032/files,],
>  the way we handled this case was a little different. We only verified that 
> the leader epoch in the request was at least as large as the current epoch in 
> the controller context. Anything higher was accepted. The controller would 
> have attempted to write the updated state to Zookeeper. This update would 
> have failed because of the controller epoch check, however, we would have 
> returned NOT_CONTROLLER in this case, which is handled in 
> `AlterPartitionManager`.
> It is tempting to revert the logic, but the risk is in the idempotency check: 
> [https://github.com/apache/kafka/pull/12032/files#diff-3e042c962e80577a4cc9bbcccf0950651c6b312097a86164af50003c00c50d37L2369.]
>  If the AlterPartition request happened to match the state inside the old 
> controller, the controller would consider the update successful and return no 
> error. But if its state was already stale at that point, then that might 
> cause the leader to incorrectly assume that the state had been updated.
> One way to fix this problem without weakening the validation is to rely on 
> the controller epoch in `AlterPartitionManager`. When we discover a new 
> controller, we also discover its epoch, so we can pass that through. The 
> `LeaderAndIsr` request already includes the controller epoch of the 
> controller that sent it and we already propagate this through to 
> `AlterPartition.submit`. Hence all we need to do is verify that the epoch of 
> the current controller target is at least as large as the one discovered 
> through the `LeaderAndIsr`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14166) Consistent toString implementations for byte arrays in generated messages

2022-08-15 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14166:
---

 Summary: Consistent toString implementations for byte arrays in 
generated messages
 Key: KAFKA-14166
 URL: https://issues.apache.org/jira/browse/KAFKA-14166
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


In the generated `toString()` implementations for message objects (such as 
protocol RPCs), we are a little inconsistent in how we display types with raw 
bytes. If the type is `Array[Byte]`, then we use Arrays.toString. If the type 
is `ByteBuffer` (i.e. when `zeroCopy` is set), then we use the corresponding 
`ByteBuffer.toString`, which is not often useful. We should try to be 
consistent. By default, it is probably not useful to print the full array 
contents, but we might print summary information (e.g. size, checksum).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13986) DescribeQuorum does not return the observers (brokers) for the Metadata log

2022-08-11 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13986.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> DescribeQuorum does not return the observers (brokers) for the Metadata log
> ---
>
> Key: KAFKA-13986
> URL: https://issues.apache.org/jira/browse/KAFKA-13986
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Niket Goel
>Assignee: Niket Goel
>Priority: Major
> Fix For: 3.3.0
>
>
> h2. Background
> While working on the [PR|https://github.com/apache/kafka/pull/12206] for 
> KIP-836, we realized that the `DescribeQuorum` API does not return the 
> brokers as observers for the metadata log.
> As noted by [~dengziming] :
> _We set nodeId=-1 if it's a broker so observers.size==0_
> The related code is:
> [https://github.com/apache/kafka/blob/4c9eeef5b2dff9a4f0977fbc5ac7eaaf930d0d0e/core/src/main/scala/kafka/raft/RaftManager.scala#L185-L189]
> {code:java}
> val nodeId = if (config.processRoles.contains(ControllerRole))
> { OptionalInt.of(config.nodeId) }
> else
> { OptionalInt.empty() }
> {code}
> h2. ToDo
> We should fix this and have the DescribeMetadata API return the brokers as 
> observers for the metadata log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14163) Build failing in streams-scala:compileScala due to zinc compiler cache

2022-08-11 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14163.
-
Resolution: Workaround

> Build failing in streams-scala:compileScala due to zinc compiler cache
> --
>
> Key: KAFKA-14163
> URL: https://issues.apache.org/jira/browse/KAFKA-14163
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>
> I have been seeing builds failing recently with the following error:
> {code:java}
> [2022-08-11T17:08:22.279Z] * What went wrong:
> [2022-08-11T17:08:22.279Z] Execution failed for task 
> ':streams:streams-scala:compileScala'.
> [2022-08-11T17:08:22.279Z] > Timeout waiting to lock zinc-1.6.1_2.13.8_17 
> compiler cache (/home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17). It 
> is currently in use by another Gradle instance.
> [2022-08-11T17:08:22.279Z]   Owner PID: 25008
> [2022-08-11T17:08:22.279Z]   Our PID: 25524
> [2022-08-11T17:08:22.279Z]   Owner Operation: 
> [2022-08-11T17:08:22.279Z]   Our operation: 
> [2022-08-11T17:08:22.279Z]   Lock file: 
> /home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17/zinc-1.6.1_2.13.8_17.lock
>    {code}
> And another:
> {code:java}
> [2022-08-10T21:30:41.779Z] * What went wrong:
> [2022-08-10T21:30:41.779Z] Execution failed for task 
> ':streams:streams-scala:compileScala'.
> [2022-08-10T21:30:41.779Z] > Timeout waiting to lock zinc-1.6.1_2.13.8_17 
> compiler cache (/home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17). It 
> is currently in use by another Gradle instance.
> [2022-08-10T21:30:41.779Z]   Owner PID: 11022
> [2022-08-10T21:30:41.779Z]   Our PID: 11766
> [2022-08-10T21:30:41.779Z]   Owner Operation: 
> [2022-08-10T21:30:41.779Z]   Our operation: 
> [2022-08-10T21:30:41.779Z]   Lock file: 
> /home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17/zinc-1.6.1_2.13.8_17.lock
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14163) Build failing in scala:compileScala due to zinc compiler cache

2022-08-11 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14163:
---

 Summary: Build failing in scala:compileScala due to zinc compiler 
cache
 Key: KAFKA-14163
 URL: https://issues.apache.org/jira/browse/KAFKA-14163
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


I have been seeing builds failing recently with the following error:

 
{code:java}
[2022-08-10T21:30:41.779Z] * What went wrong: 

[2022-08-10T21:30:41.779Z] Execution failed for task 
':streams:streams-scala:compileScala'. 

[2022-08-10T21:30:41.779Z] > Timeout waiting to lock zinc-1.6.1_2.13.8_17 
compiler cache (/home/jenkins/.gradle/caches/7.5.1/zinc-1.6.1_2.13.8_17). It is 
currently in use by another Gradle instance.
  {code}


 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14154) Persistent URP after controller soft failure

2022-08-09 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14154:
---

 Summary: Persistent URP after controller soft failure
 Key: KAFKA-14154
 URL: https://issues.apache.org/jira/browse/KAFKA-14154
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


We ran into a scenario where a partition leader was unable to expand the ISR 
after a soft controller failover. Here is what happened:

Initial state: leader=1, isr=[1,2], leader epoch=10. Broker 1 is acting as the 
current controller.

1. Broker 1 loses its session in Zookeeper.  

2. Broker 2 becomes the new controller.

3. During initialization, controller 2 removes 1 from the ISR. So state is 
updated: leader=2, isr=[1, 2], leader epoch=11.

4. Broker 2 receives `LeaderAndIsr` from the new controller with leader 
epoch=11.

5. Broker 2 immediately tries to add replica 1 back to the ISR since it is 
still fetching and is caught up. However, the 
`BrokerToControllerChannelManager` is still pointed at controller 1, so that is 
where the `AlterPartition` is sent.

6. Controller 1 does not yet realize that it is not the controller, so it 
processes the `AlterPartition` request. It sees the leader epoch of 11, which 
is higher than what it has in its own context. Following changes to the 
`AlterPartition` validation in 
[https://github.com/apache/kafka/pull/12032/files,] the controller returns 
FENCED_LEADER_EPOCH.

7. After receiving the FENCED_LEADER_EPOCH from the old controller, the leader 
is stuck because it assumes that the error implies that another LeaderAndIsr 
request should be sent.

Prior to 
[https://github.com/apache/kafka/pull/12032/files|https://github.com/apache/kafka/pull/12032/files,],
 the way we handled this case was a little different. We only verified that the 
leader epoch in the request was at least as large as the current epoch in the 
controller context. Anything higher was accepted. The controller would have 
attempted to write the updated state to Zookeeper. This update would have 
failed because of the controller epoch check, however, we would have returned 
NOT_CONTROLLER in this case, which is handled in `AlterPartitionManager`.

It is tempting to revert the logic, but the risk is in the idempotency check: 
[https://github.com/apache/kafka/pull/12032/files#diff-3e042c962e80577a4cc9bbcccf0950651c6b312097a86164af50003c00c50d37L2369.]
 If the AlterPartition request happened to match the state inside the old 
controller, the controller would consider the update successful and return no 
error. But if its state was already stale at that point, then that might cause 
the leader to incorrectly assume that the state had been updated.

One way to fix this problem without weakening the validation is to rely on the 
controller epoch in `AlterPartitionManager`. When we discover a new controller, 
we also discover its epoch, so we can pass that through. The `LeaderAndIsr` 
request already includes the controller epoch of the controller that sent it 
and we already propagate this through to `AlterPartition.submit`. Hence all we 
need to do is verify that the epoch of the current controller target is at 
least as large as the one discovered through the `LeaderAndIsr`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14152) Add logic to fence kraft brokers which have fallen behind in replication

2022-08-09 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14152:
---

 Summary: Add logic to fence kraft brokers which have fallen behind 
in replication
 Key: KAFKA-14152
 URL: https://issues.apache.org/jira/browse/KAFKA-14152
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


When a kraft broker registers with the controller, it must catch up to the 
current metadata before it is unfenced. However, once it has been unfenced, it 
only needs to continue sending heartbeats to remain unfenced. It can fall 
arbitrarily behind in the replication of the metadata log and remain unfenced. 
We should consider whether there is an inverse condition that we can use to 
fence a broker that has fallen behind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14144) AlterPartition is not idempotent when requests time out

2022-08-09 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14144.
-
Resolution: Fixed

> AlterPartition is not idempotent when requests time out
> ---
>
> Key: KAFKA-14144
> URL: https://issues.apache.org/jira/browse/KAFKA-14144
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: David Mao
>Assignee: David Mao
>Priority: Blocker
> Fix For: 3.3.0
>
>
> [https://github.com/apache/kafka/pull/12032] changed the validation order of 
> AlterPartition requests to fence requests with a stale partition epoch before 
> we compare the leader and ISR contents.
> This results in a loss of idempotency if a leader does not receive an 
> AlterPartition response because retries will receive an 
> INVALID_UPDATE_VERSION error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14104) Perform CRC validation on KRaft Batch Records and Snapshots

2022-08-08 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14104.
-
Resolution: Fixed

> Perform CRC validation on KRaft Batch Records and Snapshots
> ---
>
> Key: KAFKA-14104
> URL: https://issues.apache.org/jira/browse/KAFKA-14104
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Niket Goel
>Assignee: Niket Goel
>Priority: Blocker
> Fix For: 3.3.0
>
>
> Today we stamp the BatchRecord header with a CRC [1] and verify that CRC 
> before the log is written to disk [2]. The CRC checks should also be verified 
> when the records are read back from disk. The same procedure should be 
> followed for KRaft snapshots as well.
> [1] 
> [https://github.com/apache/kafka/blob/6b76c01cf895db0651e2cdcc07c2c392f00a8ceb/clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java#L501=]
>  
> [2] 
> [https://github.com/apache/kafka/blob/679e9e0cee67e7d3d2ece204a421ea7da31d73e9/core/src/main/scala/kafka/log/UnifiedLog.scala#L1143]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14139) Replaced disk can lead to loss of committed data even with non-empty ISR

2022-08-03 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14139:
---

 Summary: Replaced disk can lead to loss of committed data even 
with non-empty ISR
 Key: KAFKA-14139
 URL: https://issues.apache.org/jira/browse/KAFKA-14139
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


We have been thinking about disk failure cases recently. Suppose that a disk 
has failed and the user needs to restart the disk from an empty state. The 
concern is whether this can lead to the unnecessary loss of committed data.

For normal topic partitions, removal from the ISR during controlled shutdown 
buys us some protection. After the replica is restarted, it must prove its 
state to the leader before it can be added back to the ISR. And it cannot 
become a leader until it does so.

An obvious exception to this is when the replica is the last member in the ISR. 
In this case, the disk failure itself has compromised the committed data, so 
some amount of loss must be expected.

We have been considering other scenarios in which the loss of one disk can lead 
to data loss even when there are replicas remaining which have all of the 
committed entries. One such scenario is this:

Suppose we have a partition with two replicas: A and B. Initially A is the 
leader and it is the only member of the ISR.
 # Broker B catches up to A, so A attempts to send an AlterPartition request to 
the controller to add B into the ISR.
 # Before the AlterPartition request is received, replica B has a hard failure.
 # The current controller successfully fences broker B. It takes no action on 
this partition since B is already out of the ISR.
 # Before the controller receives the AlterPartition request to add B, it also 
fails.
 # While the new controller is initializing, suppose that replica B finishes 
startup, but the disk has been replaced (all of the previous state has been 
lost).
 # The new controller sees the registration from broker B first.
 # Finally, the AlterPartition from A arrives which adds B back into the ISR.

(Credit for coming up with this scenario goes to [~junrao] .)

I tested this in KRaft and confirmed that this sequence is possible (even if 
perhaps unlikely). There are a few ways we could have potentially detected the 
issue. First, perhaps the leader should have bumped the leader epoch on all 
partitions when B was fenced. Then the inflight AlterPartition would be doomed 
no matter when it arrived.

Alternatively, we could have relied on the broker epoch to distinguish the dead 
broker's state from that of the restarted broker. This could be done by 
including the broker epoch in both the `Fetch` request and in `AlterPartition`.

Finally, perhaps even normal kafka replication should be using a unique 
identifier for each disk so that we can reliably detect when it has changed. 
For example, something like what was proposed for the metadata quorum here: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes.]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14078) Replica fetches to follower should return NOT_LEADER error

2022-07-25 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14078.
-
Resolution: Fixed

> Replica fetches to follower should return NOT_LEADER error
> --
>
> Key: KAFKA-14078
> URL: https://issues.apache.org/jira/browse/KAFKA-14078
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.3.0
>
>
> After the fix for KAFKA-13837, if a follower receives a request from another 
> replica, it will return UNKNOWN_LEADER_EPOCH even if the leader epoch 
> matches. We need to do epoch leader/epoch validation first before we check 
> whether we have a valid replica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14078) Replica fetches to follower should return NOT_LEADER error

2022-07-14 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14078:
---

 Summary: Replica fetches to follower should return NOT_LEADER error
 Key: KAFKA-14078
 URL: https://issues.apache.org/jira/browse/KAFKA-14078
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson
 Fix For: 3.3.0


After the fix for KAFKA-13837, if a follower receives a request from another 
replica, it will return UNKNOWN_LEADER_EPOCH even if the leader epoch matches. 
We need to do epoch leader/epoch validation first before we check whether we 
have a valid replica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14077) KRaft should support recovery from failed disk

2022-07-14 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14077:
---

 Summary: KRaft should support recovery from failed disk
 Key: KAFKA-14077
 URL: https://issues.apache.org/jira/browse/KAFKA-14077
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
 Fix For: 3.3.0


If one of the nodes in the metadata quorum has a disk failure, there is no way 
currently to safely bring the node back into the quorum. When we lose disk 
state, we are at risk of losing committed data even if the failure only affects 
a minority of the cluster.

Here is an example. Suppose that a metadata quorum has 3 members: v1, v2, and 
v3. Initially, v1 is the leader and writes a record at offset 1. After v2 
acknowledges replication of the record, it becomes committed. Suppose that v1 
fails before v3 has a chance to replicate this record. As long as v1 remains 
down, the raft protocol guarantees that only v2 can become leader, so the 
record cannot be lost. The raft protocol expects that when v1 returns, it will 
still have that record, but what if there is a disk failure, the state cannot 
be recovered and v1 participates in leader election? Then we would have 
committed data on a minority of the voters. The main problem here concerns how 
we recover from this impaired state without risking the loss of this data.

Consider a naive solution which brings v1 back with an empty disk. Since the 
node has lost is prior knowledge of the state of the quorum, it will vote for 
any candidate that comes along. If v3 becomes a candidate, then it will vote 
for itself and it just needs the vote from v1 to become leader. If that 
happens, then the committed data on v2 will become lost.

This is just one scenario. In general, the invariants that the raft protocol is 
designed to preserve go out the window when disk state is lost. For example, it 
is also possible to contrive a scenario where the loss of disk state leads to 
multiple leaders. There is a good reason why raft requires that any vote cast 
by a voter is written to disk since otherwise the voter may vote for different 
candidates in the same epoch.

Many systems solve this problem with a unique identifier which is generated 
automatically and stored on disk. This identifier is then committed to the raft 
log. If a disk changes, we would see a new identifier and we can prevent the 
node from breaking raft invariants. Then recovery from a failed disk requires a 
quorum reconfiguration. We need something like this in KRaft to make disk 
recovery possible.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14055) Transaction markers may be lost during cleaning if data keys conflict with marker keys

2022-07-10 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14055.
-
Resolution: Fixed

> Transaction markers may be lost during cleaning if data keys conflict with 
> marker keys
> --
>
> Key: KAFKA-14055
> URL: https://issues.apache.org/jira/browse/KAFKA-14055
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 3.3.0, 3.0.2, 3.1.2, 3.2.1
>
>
> We have been seeing recently hanging transactions occur on streams changelog 
> topics quite frequently. After investigation, we found that the keys used in 
> the changelog topic conflict with the keys used in the transaction markers 
> (the schema used in control records is 4 bytes, which happens to be the same 
> for the changelog topics that we investigated). When we build the offset map 
> prior to cleaning, we do properly exclude the transaction marker keys, but 
> the bug is the fact that we do not exclude them during the cleaning phase. 
> This can result in the marker being removed from the cleaned log before the 
> corresponding data is removed when there is a user record with a conflicting 
> key at a higher offset. A side effect of this is a so-called "hanging" 
> transaction, but the bigger problem is that we lose the atomicity of the 
> transaction. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14050) Older clients cannot deserialize ApiVersions response with finalized feature epoch

2022-07-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14050.
-
Resolution: Not A Problem

I'm going to close this since the incompatible schema change did not affect any 
released versions. 

> Older clients cannot deserialize ApiVersions response with finalized feature 
> epoch
> --
>
> Key: KAFKA-14050
> URL: https://issues.apache.org/jira/browse/KAFKA-14050
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0
>
>
> When testing kraft locally, we encountered this exception from an older 
> client:
> {code:java}
> [ERROR] 2022-07-05 16:45:01,165 [kafka-admin-client-thread | 
> adminclient-1394] org.apache.kafka.common.utils.KafkaThread 
> lambda$configureThread$0 - Uncaught exception in thread 
> 'kafka-admin-client-thread | adminclient-1394':
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
> 'api_keys': Error reading array of size 1207959552, only 579 bytes available
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:118)
> at 
> org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:378)
> at 
> org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:187)
> at 
> org.apache.kafka.clients.NetworkClient$DefaultClientInterceptor.parseResponse(NetworkClient.java:1333)
> at 
> org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:752)
> at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:888)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:577)
> at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1329)
> at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1260)
> at java.base/java.lang.Thread.run(Thread.java:832) {code}
> The cause appears to be from a change to the type of the 
> `FinalizedFeaturesEpoch` field in the `ApiVersions` response from int32 to 
> int64: 
> [https://github.com/apache/kafka/pull/9001/files#diff-32006e8becae918416debdb9ac76bf8a1ad12b83aaaf5f8819b6ecc00c1fb56bR58.]
> Fortunately, `FinalizedFeaturesEpoch` is a tagged field, so we can fix this 
> by creating a new field. We will have to leave the existing tag in the 
> protocol spec and consider it dead.
> Credit for this find goes to [~dajac] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14055) Transaction markers may be lost during cleaning if data keys conflict with marker keys

2022-07-07 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14055:
---

 Summary: Transaction markers may be lost during cleaning if data 
keys conflict with marker keys
 Key: KAFKA-14055
 URL: https://issues.apache.org/jira/browse/KAFKA-14055
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
 Fix For: 3.3.0, 3.0.2, 3.1.2, 3.2.1


We have been seeing recently hanging transactions occur on streams changelog 
topics quite frequently. After investigation, we found that the keys used in 
the changelog topic conflict with the keys used in the transaction markers (the 
schema used in control records is 4 bytes, which happens to be the same for the 
changelog topics that we investigated). When we build the offset map prior to 
cleaning, we do properly exclude the transaction marker keys, but the bug is 
the fact that we do not exclude them during the cleaning phase. This can result 
in the marker being removed from the cleaned log before the corresponding data 
is removed when there is a user record with a conflicting key at a higher 
offset. A side effect of this is a so-called "hanging" transaction, but the 
bigger problem is that we lose the atomicity of the transaction. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14032) Dequeue time for forwarded requests is ignored to set

2022-07-06 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14032.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Dequeue time for forwarded requests is ignored to set
> -
>
> Key: KAFKA-14032
> URL: https://issues.apache.org/jira/browse/KAFKA-14032
> Project: Kafka
>  Issue Type: Bug
>Reporter: Feiyan Yu
>Priority: Minor
> Fix For: 3.3.0
>
>
> It seems like `requestDequeueTimeNanos` is ignored to set.
> As a property of a `Request object`, `requestDequeueTimeNanos` is set only 
> when handlers manage to poll and handle this request from `requestQueue`, 
> however, handlers only poll the request from envelop request once, but calls 
> handle method twice, which lead to an ignorance of `requestDequeueTimeNanos` 
> for parsed forwarded requests.
> The parsed envelop requests have `requestDequeueTimeNanos` = -1, and it 
> affect the correctness of statistics and metrics of `LocalTimeMs`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14050) Older clients cannot deserialize ApiVersions response with finalized feature epoch

2022-07-06 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14050:
---

 Summary: Older clients cannot deserialize ApiVersions response 
with finalized feature epoch
 Key: KAFKA-14050
 URL: https://issues.apache.org/jira/browse/KAFKA-14050
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
 Fix For: 3.3.0


When testing kraft locally, we encountered this exception from an older client:
{code:java}
[ERROR] 2022-07-05 16:45:01,165 [kafka-admin-client-thread | adminclient-1394] 
org.apache.kafka.common.utils.KafkaThread lambda$configureThread$0 - Uncaught 
exception in thread 'kafka-admin-client-thread | admi
nclient-1394':
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_keys': Error reading array of size 1207959552, only 579 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:118)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:378)
at 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:187)
at 
org.apache.kafka.clients.NetworkClient$DefaultClientInterceptor.parseResponse(NetworkClient.java:1333)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:752)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:888)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:577)
at 
org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1329)
at 
org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1260)
at java.base/java.lang.Thread.run(Thread.java:832) {code}
The cause appears to be from a change to the type of the 
`FinalizedFeaturesEpoch` field in the `ApiVersions` response from int32 to 
int64: 
[https://github.com/confluentinc/ce-kafka/commit/fb4f297207ef62f71e4a6d2d0dac75752933043d#diff-32006e8becae918416debdb9ac76bf8a1ad12b83aaaf5f8819b6ecc00c1fb56bL58.]

Fortunately, `FinalizedFeaturesEpoch` is a tagged field, so we can fix this by 
creating a new field. We will have to leave the existing tag in the protocol 
spec and consider it dead.

Credit for this find goes to [~dajac] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13943) Fix flaky test QuorumControllerTest.testMissingInMemorySnapshot()

2022-07-05 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13943.
-
Resolution: Fixed

> Fix flaky test QuorumControllerTest.testMissingInMemorySnapshot()
> -
>
> Key: KAFKA-13943
> URL: https://issues.apache.org/jira/browse/KAFKA-13943
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Divij Vaidya
>Assignee: Divij Vaidya
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.3.0, 3.3
>
>
> Test failed at 
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-12197/3/tests]
>  
> {noformat}
> [2022-05-27 09:34:42,382] INFO [Controller 0] Creating new QuorumController 
> with clusterId wj9LhgPJTV-KYEItgqvtQA, authorizer Optional.empty. 
> (org.apache.kafka.controller.QuorumController:1484)
> [2022-05-27 09:34:42,393] DEBUG [LocalLogManager 0] Node 0: running log 
> check. (org.apache.kafka.metalog.LocalLogManager:479)
> [2022-05-27 09:34:42,394] DEBUG [LocalLogManager 0] initialized local log 
> manager for node 0 (org.apache.kafka.metalog.LocalLogManager:622)
> [2022-05-27 09:34:42,396] INFO [LocalLogManager 0] Node 0: registered 
> MetaLogListener 1774961169 (org.apache.kafka.metalog.LocalLogManager:640)
> [2022-05-27 09:34:42,397] DEBUG [LocalLogManager 0] Node 0: running log 
> check. (org.apache.kafka.metalog.LocalLogManager:479)
> [2022-05-27 09:34:42,397] DEBUG [LocalLogManager 0] Node 0: Executing 
> handleLeaderChange LeaderAndEpoch(leaderId=OptionalInt[0], epoch=1) 
> (org.apache.kafka.metalog.LocalLogManager:520)
> [2022-05-27 09:34:42,398] DEBUG [Controller 0] Executing 
> handleLeaderChange[1]. (org.apache.kafka.controller.QuorumController:438)
> [2022-05-27 09:34:42,398] INFO [Controller 0] Becoming the active controller 
> at epoch 1, committed offset -1, committed epoch -1, and metadata.version 5 
> (org.apache.kafka.controller.QuorumController:950)
> [2022-05-27 09:34:42,398] DEBUG [Controller 0] Creating snapshot -1 
> (org.apache.kafka.timeline.SnapshotRegistry:197)
> [2022-05-27 09:34:42,399] DEBUG [Controller 0] Processed 
> handleLeaderChange[1] in 951 us 
> (org.apache.kafka.controller.QuorumController:385)
> [2022-05-27 09:34:42,399] INFO [Controller 0] Initializing metadata.version 
> to 5 (org.apache.kafka.controller.QuorumController:926)
> [2022-05-27 09:34:42,399] INFO [Controller 0] Setting metadata.version to 5 
> (org.apache.kafka.controller.FeatureControlManager:273)
> [2022-05-27 09:34:42,400] DEBUG [Controller 0] Creating snapshot 
> 9223372036854775807 (org.apache.kafka.timeline.SnapshotRegistry:197)
> [2022-05-27 09:34:42,400] DEBUG [Controller 0] Read-write operation 
> bootstrapMetadata(1863535402) will be completed when the log reaches offset 
> 9223372036854775807. (org.apache.kafka.controller.QuorumController:725)
> [2022-05-27 09:34:42,402] DEBUG append(batch=LocalRecordBatch(leaderEpoch=1, 
> appendTimestamp=10, 
> records=[ApiMessageAndVersion(RegisterBrokerRecord(brokerId=0, 
> incarnationId=kxAT73dKQsitIedpiPtwBw, brokerEpoch=-9223372036854775808, 
> endPoints=[BrokerEndpoint(name='PLAINTEXT', host='localhost', port=9092, 
> securityProtocol=0)], features=[], rack=null, fenced=true) at version 0)]), 
> prevOffset=1) (org.apache.kafka.metalog.LocalLogManager$SharedLogData:247)
> [2022-05-27 09:34:42,402] INFO [Controller 0] Registered new broker: 
> RegisterBrokerRecord(brokerId=0, incarnationId=kxAT73dKQsitIedpiPtwBw, 
> brokerEpoch=-9223372036854775808, endPoints=[BrokerEndpoint(name='PLAINTEXT', 
> host='localhost', port=9092, securityProtocol=0)], features=[], rack=null, 
> fenced=true) (org.apache.kafka.controller.ClusterControlManager:368)
> [2022-05-27 09:34:42,403] WARN [Controller 0] registerBroker: failed with 
> unknown server exception RuntimeException at epoch 1 in 2449 us.  Reverting 
> to last committed offset -1. 
> (org.apache.kafka.controller.QuorumController:410)java.lang.RuntimeException: 
> Can't create a new snapshot at epoch 1 because there is already a snapshot 
> with epoch 9223372036854775807at 
> org.apache.kafka.timeline.SnapshotRegistry.getOrCreateSnapshot(SnapshotRegistry.java:190)
> at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:723)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
> at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
> at java.base/java.lang.Thread.run(Thread.java:833){noformat}
> {noformat}
> Full stack trace
> java.util.concurrent.ExecutionException: 
> 

[jira] [Resolved] (KAFKA-14035) QuorumController handleRenounce throws NPE

2022-06-30 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14035.
-
Fix Version/s: 3.1.2
   3.2.1
   Resolution: Fixed

> QuorumController handleRenounce throws NPE
> --
>
> Key: KAFKA-14035
> URL: https://issues.apache.org/jira/browse/KAFKA-14035
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Niket Goel
>Assignee: Niket Goel
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
>
> Sometimes when the controller is rolled you can encounter the following 
> exception, after which the controller in-memory state seems to become 
> inconsistent with the Metadata Log.
>  
> [Controller 1] handleRenounce[23]: failed with unknown server exception 
> NullPointerException at epoch -1 in  us. Reverting to last committed 
> offset .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14036) Kraft controller local time not computed correctly.

2022-06-30 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-14036.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Kraft controller local time not computed correctly.
> ---
>
> Key: KAFKA-14036
> URL: https://issues.apache.org/jira/browse/KAFKA-14036
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
> Fix For: 3.3.0
>
>
> In `ControllerApis`, we are missing the logic to set the local processing end 
> time after `handle` returns. As a consequence of this, the remote time ends 
> up reported as the local time in the request level metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14036) Kraft controller local time not computed correctly.

2022-06-30 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-14036:
---

 Summary: Kraft controller local time not computed correctly.
 Key: KAFKA-14036
 URL: https://issues.apache.org/jira/browse/KAFKA-14036
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In `ControllerApis`, we are missing the logic to set the local processing end 
time after `handle` returns. As a consequence of this, the remote time ends up 
reported as the local time in the request level metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-13888) KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag

2022-06-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reopened KAFKA-13888:
-

> KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag
> --
>
> Key: KAFKA-13888
> URL: https://issues.apache.org/jira/browse/KAFKA-13888
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Niket Goel
>Assignee: lqjacklee
>Priority: Major
> Fix For: 3.3.0
>
>
> Tracking issue for the implementation of KIP:836



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13888) KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag

2022-06-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13888.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> KIP-836: Addition of Information in DescribeQuorumResponse about Voter Lag
> --
>
> Key: KAFKA-13888
> URL: https://issues.apache.org/jira/browse/KAFKA-13888
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Niket Goel
>Assignee: lqjacklee
>Priority: Major
> Fix For: 3.3.0
>
>
> Tracking issue for the implementation of KIP:836



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13967) Guarantees for producer callbacks on transaction commit should be documented

2022-06-10 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13967.
-
Resolution: Fixed

> Guarantees for producer callbacks on transaction commit should be documented
> 
>
> Key: KAFKA-13967
> URL: https://issues.apache.org/jira/browse/KAFKA-13967
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>
> As discussed in 
> https://github.com/apache/kafka/pull/11780#discussion_r891835221, part of the 
> contract for a transactional producer is that all callbacks given to the 
> producer will have been invoked and completed (either successfully or by 
> throwing an exception) by the time that {{KafkaProducer::commitTransaction}} 
> returns. This should be documented so that users of the clients library can 
> have a guarantee that they're not on the hook to do that kind of bookkeeping 
> themselves.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13972) Reassignment cancellation causes stray replicas

2022-06-08 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13972:
---

 Summary: Reassignment cancellation causes stray replicas
 Key: KAFKA-13972
 URL: https://issues.apache.org/jira/browse/KAFKA-13972
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


A stray replica is one that is left behind on a broker after the partition has 
been reassigned to other brokers or the partition has been deleted. We found 
one case where this can happen is after a cancelled reassignment. When a 
reassignment is cancelled, the controller sends `StopReplica` requests to any 
of the adding replicas, but it does not necessarily bump the leader epoch. 
Following 
[KIP-570|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest],]
 brokers will ignore `StopReplica` requests if the leader epoch matches the 
current partition leader epoch. So we need to bump the epoch whenever we need 
to ensure that `StopReplica` will be received.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13966) Flaky test `QuorumControllerTest.testUnregisterBroker`

2022-06-07 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13966:
---

 Summary: Flaky test `QuorumControllerTest.testUnregisterBroker`
 Key: KAFKA-13966
 URL: https://issues.apache.org/jira/browse/KAFKA-13966
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


We have seen the following assertion failure in 
`QuorumControllerTest.testUnregisterBroker`:

```

org.opentest4j.AssertionFailedError: expected: <2> but was: <0> at 
org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at 
org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) at 
org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628) at 
org.apache.kafka.controller.QuorumControllerTest.testUnregisterBroker(QuorumControllerTest.java:494)

```

I reproduced it by running the test in a loop. It looks like what happens is 
that the BrokerRegistration request is able to get interleaved between the 
leader change event and the write of the bootstrap metadata. Something like 
this:
 # handleLeaderChange() start
 # appendWriteEvent(registerBroker)
 # appendWriteEvent(bootstrapMetadata)
 # handleLeaderChange() finish
 # registerBroker() -> writes broker registration to log
 # bootstrapMetadata() -> writes bootstrap metadata to log



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13942) LogOffsetTest occasionally hangs during Jenkins build

2022-06-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13942.
-
Resolution: Fixed

> LogOffsetTest occasionally hangs during Jenkins build
> -
>
> Key: KAFKA-13942
> URL: https://issues.apache.org/jira/browse/KAFKA-13942
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: David Arthur
>Priority: Minor
>
> [~hachikuji] parsed the log output of one of the recent stalled Jenkins 
> builds and singled out LogOffsetTest as a likely culprit for not completing.
> I looked closely at the following build which appeared to be stuck and found 
> this test case had STARTED but not PASSED or FAILED.
> 15:19:58  LogOffsetTest > 
> testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps(String) > 
> kafka.server.LogOffsetTest.testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps(String)[2]
>  STARTED



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13592) Fix flaky test ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions

2022-06-06 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13592.
-
Resolution: Fixed

> Fix flaky test 
> ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions
> -
>
> Key: KAFKA-13592
> URL: https://issues.apache.org/jira/browse/KAFKA-13592
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Jacot
>Assignee: Kvicii.Yu
>Priority: Minor
>
> {noformat}
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:529)
>   at scala.None$.get(Option.scala:527)
>   at 
> kafka.controller.ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions(ControllerIntegrationTest.scala:1239){noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13946) setMetadataDirectory() method in builder for ControllerNode has no parameters

2022-05-30 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13946.
-
Resolution: Fixed

> setMetadataDirectory() method in builder for ControllerNode has no parameters
> -
>
> Key: KAFKA-13946
> URL: https://issues.apache.org/jira/browse/KAFKA-13946
> Project: Kafka
>  Issue Type: Bug
>Reporter: Clara Fang
>Priority: Minor
>
> In core/src/test/java/kafka/testkit/ControllerNode.java, the method 
> setMetadataDirectory for the builder has no parameters and is assigning the 
> variable metadataDirectory to itself.
> {code:java}
> public Builder setMetadataDirectory() {
> this.metadataDirectory = metadataDirectory;
> return this;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13941) Re-enable ARM builds following INFRA-23305

2022-05-27 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13941.
-
Resolution: Fixed

> Re-enable ARM builds following INFRA-23305
> --
>
> Key: KAFKA-13941
> URL: https://issues.apache.org/jira/browse/KAFKA-13941
> Project: Kafka
>  Issue Type: Task
>  Components: build
>Reporter: David Arthur
>Priority: Major
>
> Once https://issues.apache.org/jira/browse/INFRA-23305 is resolved, we should 
> re-enable ARM builds in the Jenkinsfile.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft

2022-05-27 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13944:
---

 Summary: Shutting down broker can be elected as partition leader 
in KRaft
 Key: KAFKA-13944
 URL: https://issues.apache.org/jira/browse/KAFKA-13944
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN 
state in the controller. It is possible for the broker to remain unfenced in 
this state until the controlled shutdown completes. When doing an election, the 
only thing we generally check is that the broker is unfenced, so this means we 
can elect a broker that is in controlled shutdown. 

Here are a few snippets from a recent system test in which this occurred:
{code:java}
// broker 2 starts controlled shutdown
[2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has 
requested and been granted a controlled shutdown. 
(org.apache.kafka.controller.BrokerHeartbeatManager)
 
// there is only one replica, so we set leader to -1
[2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 
with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, 
partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)

// controlled shutdown cannot complete immediately
[2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to 
shut down can not yet be granted because the lowest active offset 177 is not 
greater than the broker's shutdown offset 244. 
(org.apache.kafka.controller.BrokerHeartbeatManager)
[2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled 
shutdown offset for broker 2 to 244. 
(org.apache.kafka.controller.BrokerHeartbeatManager)

// later on we elect leader 2 again
[2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 
with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, 
partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)

// now controlled shutdown is stuck because of the newly elected leader
[2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled 
shutdown state, but can not shut down because more leaders still need to be 
moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13858) Kraft should not shutdown metadata listener until controller shutdown is finished

2022-05-25 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13858.
-
Fix Version/s: 3.3
   Resolution: Fixed

> Kraft should not shutdown metadata listener until controller shutdown is 
> finished
> -
>
> Key: KAFKA-13858
> URL: https://issues.apache.org/jira/browse/KAFKA-13858
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: David Jacot
>Priority: Major
>  Labels: kip-500
> Fix For: 3.3
>
>
> When the kraft broker begins controlled shutdown, it immediately disables the 
> metadata listener. This means that metadata changes as part of the controlled 
> shutdown do not get sent to the respective components. For partitions that 
> the broker is follower of, that is what we want. It prevents the follower 
> from being able to rejoin the ISR while still shutting down. But for 
> partitions that the broker is leading, it means the leader will remain active 
> until controlled shutdown finishes and the socket server is stopped. That 
> delay can be as much as 5 seconds and probably even worse.
> In the zk world, we have an explicit request `StopReplica` which serves the 
> purpose of shutting down both follower and leader, but we don't have 
> something similar in kraft. For KRaft, we may not necessarily need an 
> explicit signal like this. We know that the broker is shutting down, so we 
> can treat partition changes as implicit `StopReplica` requests rather than 
> going through the normal `LeaderAndIsr` flow.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13940) DescribeQuorum returns INVALID_REQUEST if not handled by leader

2022-05-25 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13940:
---

 Summary: DescribeQuorum returns INVALID_REQUEST if not handled by 
leader
 Key: KAFKA-13940
 URL: https://issues.apache.org/jira/browse/KAFKA-13940
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In `KafkaRaftClient.handleDescribeQuorum`, we currently return INVALID_REQUEST 
if the node is not the current raft leader. This is surprising and doesn't work 
with our general approach for retrying forwarded APIs. In 
`BrokerToControllerChannelManager`, we only retry after `NOT_CONTROLLER` 
errors. It would be more consistent with the other Raft APIs if we returned 
NOT_LEADER_OR_FOLLOWER, but that also means we need additional logic in 
`BrokerToControllerChannelManager` to handle that error and retry correctly. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13923) ZooKeeperAuthorizerTest should use standard authorizer for kraft

2022-05-23 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13923.
-
Resolution: Fixed

> ZooKeeperAuthorizerTest should use standard authorizer for kraft
> 
>
> Key: KAFKA-13923
> URL: https://issues.apache.org/jira/browse/KAFKA-13923
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> Our system test `ZooKeeperAuthorizerTest` relies on the zk-based 
> `AclAuthorizer` even when running KRaft. We should update this test to use 
> `StandardAuthorizer` (and probably change the name while we're at it).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13889) Fix AclsDelta to handle ACCESS_CONTROL_ENTRY_RECORD quickly followed by REMOVE_ACCESS_CONTROL_ENTRY_RECORD for same ACL

2022-05-21 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13889.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Fix AclsDelta to handle ACCESS_CONTROL_ENTRY_RECORD quickly followed by 
> REMOVE_ACCESS_CONTROL_ENTRY_RECORD for same ACL
> ---
>
> Key: KAFKA-13889
> URL: https://issues.apache.org/jira/browse/KAFKA-13889
> Project: Kafka
>  Issue Type: Bug
>Reporter: Andrew Grant
>Priority: Major
> Fix For: 3.3.0
>
>
> In 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/image/AclsDelta.java#L64]
>  we store the pending deletion in the changes map. This could override a 
> creation that might have just happened. This is an issue because in 
> BrokerMetadataPublisher this results in us making a removeAcl call which 
> finally results in 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizerData.java#L203]
>  being executed and this code throws an exception if the ACL isnt in the Map 
> yet. If the ACCESS_CONTROL_ENTRY_RECORD event never got processed by 
> BrokerMetadataPublisher then the ACL wont be in the Map yet.
> My feeling is we might want to make removeAcl idempotent in that it returns 
> success if the ACL doesn't exist: no matter how many times removeAcl is 
> called it returns success if the ACL is deleted. Maybe we’d just log a 
> warning or something?
> Note, I dont think the AclControlManager has this issue because it doesn't 
> batch the events like AclsDelta does. However, we still do throw a 
> RuntimeException here 
> [https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/AclControlManager.java#L197]
>  - maybe we should still follow the same logic (if we make the fix suggested 
> above) and just log a warning if the ACL doesnt exist in the Map?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13923) ZooKeeperAuthorizerTest should use standard authorizer for kraft

2022-05-20 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13923:
---

 Summary: ZooKeeperAuthorizerTest should use standard authorizer 
for kraft
 Key: KAFKA-13923
 URL: https://issues.apache.org/jira/browse/KAFKA-13923
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Our system test `ZooKeeperAuthorizerTest` relies on the zk-based 
`AclAuthorizer` even when running KRaft. We should update this test to use 
`StandardAuthorizer` (and probably change the name while we're at it).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13863) Prevent null config value when create topic in KRaft mode

2022-05-19 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13863.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Prevent null config value when create topic in KRaft mode
> -
>
> Key: KAFKA-13863
> URL: https://issues.apache.org/jira/browse/KAFKA-13863
> Project: Kafka
>  Issue Type: Bug
>Reporter: dengziming
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13837) Return error for Fetch requests from unrecognized followers

2022-05-18 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13837.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Return error for Fetch requests from unrecognized followers
> ---
>
> Key: KAFKA-13837
> URL: https://issues.apache.org/jira/browse/KAFKA-13837
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.3.0
>
>
> If the leader of a partition receives a request from a replica which is not 
> in the current replica set, we currently return an empty fetch response with 
> no error. I think the rationale for this is that the leader may not have 
> received the latest `LeaderAndIsr` update which adds the replica, so we just 
> want the follower to retry. The problem with this is that if the 
> `LeaderAndIsr` request never arrives, then there might not be an external 
> indication of a problem. This probably was the only reasonable option before 
> we added the leader epoch to the `Fetch` request API. Now that we have it, it 
> would be clearer to return an `UNKNOWN_LEADER_EPOCH` error to indicate that 
> the (replicaId, leaderEpoch) tuple is not recognized. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13914) Implement kafka-metadata-quorum.sh

2022-05-18 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13914:
---

 Summary: Implement kafka-metadata-quorum.sh
 Key: KAFKA-13914
 URL: https://issues.apache.org/jira/browse/KAFKA-13914
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


KIP-595 documents a tool for describing quorum status 
`kafka-metadata-quorum.sh`: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-ToolingSupport.]
  We need to implement this.

Note that this depends on the Admin API for `DescribeQuorum`, which is proposed 
in KIP-836: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-836%3A+Addition+of+Information+in+DescribeQuorumResponse+about+Voter+Lag.]
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13899) Inconsistent error codes returned from AlterConfig APIs

2022-05-16 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13899.
-
Fix Version/s: 3.2.1
   Resolution: Fixed

> Inconsistent error codes returned from AlterConfig APIs
> ---
>
> Key: KAFKA-13899
> URL: https://issues.apache.org/jira/browse/KAFKA-13899
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.2.1
>
>
> In the AlterConfigs/IncrementalAlterConfigs zk handler, we return 
> INVALID_REQUEST and INVALID_CONFIG inconsistently. The problem is in 
> `LogConfig.validate`. We may either return `ConfigException` or 
> `InvalidConfigException`. When the first of these is thrown, we catch it and 
> convert to INVALID_REQUEST. It seems more consistent to convert to 
> INVALID_CONFIG.
> Note that the kraft implementation returns INVALID_CONFIG consistently.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13899) Inconsistent error codes returned from AlterConfig APIs

2022-05-13 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13899:
---

 Summary: Inconsistent error codes returned from AlterConfig APIs
 Key: KAFKA-13899
 URL: https://issues.apache.org/jira/browse/KAFKA-13899
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


In the AlterConfigs/IncrementalAlterConfigs zk handler, we return 
INVALID_REQUEST and INVALID_CONFIG inconsistently. The problem is in 
`LogConfig.validate`. We may either return `ConfigException` or 
`InvalidConfigException`. When the first of these is thrown, we catch it and 
convert to INVALID_REQUEST. It seems more consistent to convert to 
INVALID_CONFIG.

Note that the kraft implementation returns INVALID_CONFIG consistently.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13862) Add And Subtract multiple config values is not supported in KRaft mode

2022-05-10 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13862.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Add And Subtract multiple config values is not supported in KRaft mode
> --
>
> Key: KAFKA-13862
> URL: https://issues.apache.org/jira/browse/KAFKA-13862
> Project: Kafka
>  Issue Type: Bug
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13869) Update quota callback metadata in KRaft

2022-05-03 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13869:
---

 Summary: Update quota callback metadata in KRaft
 Key: KAFKA-13869
 URL: https://issues.apache.org/jira/browse/KAFKA-13869
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


The `ClientQuotaCallback` interface implements a method 
`updateClusterMetadata`, which allows the callback to take partition 
assignments into account when assigning quotas. For zk , this method is called 
after receiving `UpdateMetadata` requests from the controller. We do not yet 
have this implemented in KRaft for updates from the metadata log.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13858) Kraft should not shutdown metadata listener until controller shutdown is finished

2022-04-26 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13858:
---

 Summary: Kraft should not shutdown metadata listener until 
controller shutdown is finished
 Key: KAFKA-13858
 URL: https://issues.apache.org/jira/browse/KAFKA-13858
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


When the kraft broker begins controlled shutdown, it immediately disables the 
metadata listener. This means that metadata changes as part of the controlled 
shutdown do not get sent to the respective components. For partitions that the 
broker is follower of, that is what we want. It prevents the follower from 
being able to rejoin the ISR while still shutting down. But for partitions that 
the broker is leading, it means the leader will remain active until controlled 
shutdown is complete.

In the zk world, we have an explicit request `StopReplica` which serves the 
purpose of shutting down both follower and leader, but we don't have something 
similar in kraft. For KRaft, we may not necessarily need an explicit signal 
like this. We know that the broker is shutting down, so we can treat partition 
changes as implicit `StopReplica` requests rather than going through the normal 
`LeaderAndIsr` flow.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13837) Return error for Fetch requests from unrecognized followers

2022-04-19 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13837:
---

 Summary: Return error for Fetch requests from unrecognized 
followers
 Key: KAFKA-13837
 URL: https://issues.apache.org/jira/browse/KAFKA-13837
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


If the leader of a partition receives a request from a replica which is not in 
the current replica set, we currently return an empty fetch response with no 
error. I think the rationale for this is that the leader may not have received 
the latest `LeaderAndIsr` update which adds the replica, so we just want the 
follower to retry. The problem with this is that if the `LeaderAndIsr` request 
never arrives, then there might not be an external indication of a problem. 
This probably was the only reasonable option before we added the leader epoch 
to the `Fetch` request API. Now that we have it, it would be clearer to return 
an `UNKNOWN_LEADER_EPOCH` error to indicate that the (replicaId, leaderEpoch) 
tuple is not recognized. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13782) Producer may fail to add the correct partition to transaction

2022-04-05 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13782.
-
Resolution: Fixed

> Producer may fail to add the correct partition to transaction
> -
>
> Key: KAFKA-13782
> URL: https://issues.apache.org/jira/browse/KAFKA-13782
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
>
> In KAFKA-13412, we changed the logic to add partitions to transactions in the 
> producer. The intention was to ensure that the partition is added in 
> `TransactionManager` before the record is appended to the 
> `RecordAccumulator`. However, this does not take into account the possibility 
> that the originally selected partition may be changed if `abortForNewBatch` 
> is set in `RecordAppendResult` in the call to `RecordAccumulator.append`. 
> When this happens, the partitioner can choose a different partition, which 
> means that the `TransactionManager` would be tracking the wrong partition.
> I think the consequence of this is that the batches sent to this partition 
> would get stuck in the `RecordAccumulator` until they timed out because we 
> validate before sending that the partition has been added correctly to the 
> transaction.
> Note that KAFKA-13412 has not been included in any release, so there are no 
> affected versions.
> Thanks to [~alivshits] for identifying the bug.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13794) Producer batch lost silently in TransactionManager

2022-04-05 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13794.
-
Fix Version/s: 3.1.1
   3.0.2
   Resolution: Fixed

> Producer batch lost silently in TransactionManager
> --
>
> Key: KAFKA-13794
> URL: https://issues.apache.org/jira/browse/KAFKA-13794
> Project: Kafka
>  Issue Type: Bug
>Reporter: xuexiaoyue
>Priority: Major
> Fix For: 3.1.1, 3.0.2
>
>
> Under the case of idempotence is enabled, when a batch reaches its 
> request.timeout.ms but not yet reaches delivery.timeout.ms, it will be 
> retried and wait for another request.timeout.ms. During the time of this 
> interval, the delivery.timeout.ms may be reached and Sender will remove this 
> in flight batch and bump the producer epoch because of the unresolved 
> sequence, then the sequence of this partition will be reset to 0.
> At this time, if a new batch is sent to the same partition and the former 
> batch reaches request.timeout.ms again, we will see an exception being thrown 
> out by NetworkClient:
> {code:java}
> [ERROR] [kafka-producer-network-thread | producer-1] 
> org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] 
> Uncaught error in request completion:
>  java.lang.IllegalStateException: We are re-enqueueing a batch which is not 
> tracked as part of the in flight requests. batch.topicPartition: 
> txn_test_1648891362900-2; batch.baseSequence: 0
>    at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.insertInSequenceOrder(RecordAccumulator.java:388)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.reenqueue(RecordAccumulator.java:334)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.reenqueueBatch(Sender.java:668)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:622)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:548)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_102] {code}
> The cause of this is the inflightBatchesBySequence in TransactionManager is 
> not being remove correctly. One batch may be removed by another batch with 
> the same sequence number.
> The potential consequence of this I can think out is that the send progress 
> will be blocked until the latter batch being expired by delivery.timeout.ms
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13790) ReplicaManager should be robust to all partition updates from kraft metadata log

2022-03-31 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13790:
---

 Summary: ReplicaManager should be robust to all partition updates 
from kraft metadata log
 Key: KAFKA-13790
 URL: https://issues.apache.org/jira/browse/KAFKA-13790
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


There are two ways that partition state can be updated in the zk world: one is 
through `LeaderAndIsr` requests and one is through `AlterPartition` responses. 
All changes made to partition state result in new LeaderAndIsr requests, but 
replicas will ignore them if the leader epoch is less than or equal to the 
current known leader epoch. Basically it works like this:
 * Changes made by the leader are done through AlterPartition requests. These 
changes bump the partition epoch (or zk version), but leave the leader epoch 
unchanged. LeaderAndIsr requests are sent by the controller, but replicas 
ignore them. Partition state is instead only updated when the AlterIsr response 
is received.
 * Changes made by the controller are made directly by the controller and 
always result in a leader epoch bump. These changes are sent to replicas 
through LeaderAndIsr requests and are applied by replicas.

The code in `kafka.server.ReplicaManager` and `kafka.cluster.Partition` are 
built on top of these assumptions. The logic in `makeLeader`, for example, 
assumes that the leader epoch has indeed been bumped. Specifically, follower 
state gets reset and a new entry is written to the leader epoch cache.

In KRaft, we also have two paths to update partition state. One is 
AlterPartition, just like in the zk world. The second is updates received from 
the metadata log. These follow the same path as LeaderAndIsr requests for the 
most part, but a big difference is that all changes are sent down to 
`kafka.cluster.Partition`, even those which do not have a bumped leader epoch. 
This breaks the assumptions mentioned above in `makeLeader`, which could result 
in leader epoch cache inconsistency. Another side effect of this on the 
follower side is that replica fetchers for updated partitions get unnecessarily 
restarted. There may be others as well.

We need to either replicate the same logic on the zookeeper side or make the 
logic robust to all updates including those without a leader epoch bump.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13789) Fragile tag ordering in metric mbean names

2022-03-31 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13789:
---

 Summary: Fragile tag ordering in metric mbean names
 Key: KAFKA-13789
 URL: https://issues.apache.org/jira/browse/KAFKA-13789
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


We noticed that mbean name creation logic is a bit fragile in 
`KafkaMetricsGroup`, which many server components rely on. We rely on the 
ordering of tags in the underlying map collection. Any change to the map 
implementation could result in a different tag ordering, which would result in 
a different mbean name.

In [https://github.com/apache/kafka/pull/11970,] we reimplemented the metric 
naming function to rely on LinkedHashMap so that the ordering is explicit. We 
should try to upgrade current metrics to rely on the new method, which probably 
will involve ensuring that we have good test coverage for registered metrics. 
At a minimum, we should ensure new metrics use the explicit ordering. Perhaps 
we can consider deprecating the old methods or creating a new 
`SafeKafkaMetricsGroup` implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13782) Producer may fail to add the correct partition to transaction

2022-03-29 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13782:
---

 Summary: Producer may fail to add the correct partition to 
transaction
 Key: KAFKA-13782
 URL: https://issues.apache.org/jira/browse/KAFKA-13782
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
 Fix For: 3.2.0, 3.1.1


In KAFKA-13412, we changed the logic to add partitions to transactions in the 
producer. The intention was to ensure that the partition is added in 
`TransactionManager` before the record is appended to the `RecordAccumulator`. 
However, this does not take into account the possibility that the originally 
selected partition may be changed if `abortForNewBatch` is set in 
`RecordAppendResult` in the call to `RecordAccumulator.append`. When this 
happens, the partitioner can choose a different partition, which means that the 
`TransactionManager` would be tracking the wrong partition.

I think the consequence of this is that the batches sent to this partition 
would get stuck in the `RecordAccumulator` until they timed out because we 
validate before sending that the partition has been added correctly to the 
transaction.

Note that KAFKA-13412 has not been included in any release, so there are no 
affected versions.

Thanks to [~alivshits] for identifying the bug.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13781) Ensure consistent inter-broker listeners in KRaft

2022-03-29 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13781:
---

 Summary: Ensure consistent inter-broker listeners in KRaft
 Key: KAFKA-13781
 URL: https://issues.apache.org/jira/browse/KAFKA-13781
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


A common cause of cluster misconfiguration is having mismatched inter-broker 
listeners. This can cause replication to fail, or in the case of zk clusters, 
it can prevent the controller from sending LeaderAndIsr state to the brokers. 
In KRaft, we can detect the misconfiguration at registration time on the 
controller and let the broker fail startup. This would allow users to detect 
the problem much more quickly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13753) Log cleaner should transaction metadata in index until corresponding marker is removed

2022-03-17 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13753:
---

 Summary: Log cleaner should transaction metadata in index until 
corresponding marker is removed
 Key: KAFKA-13753
 URL: https://issues.apache.org/jira/browse/KAFKA-13753
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Currently the log cleaner will remove aborted transactions from the index as 
soon as it detects that the data from the transaction is gone. It does not wait 
until the corresponding marker has also been removed. Although it is extremely 
unlikely, it seems possible today that a Fetch might fail to return the aborted 
transaction metadata correctly if a log cleaning occurs concurrently. This is 
because the collection of aborted transactions is only done after the reading 
data from the log. It would be safer to preserve the aborted transaction 
metadata in the index until the marker is also removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13727) Edge case in cleaner can result in premature removal of ABORT marker

2022-03-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13727.
-
Fix Version/s: 2.8.2
   3.1.1
   3.0.2
   Resolution: Fixed

> Edge case in cleaner can result in premature removal of ABORT marker
> 
>
> Key: KAFKA-13727
> URL: https://issues.apache.org/jira/browse/KAFKA-13727
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.8.2, 3.1.1, 3.0.2
>
>
> The log cleaner works by first building a map of the active keys beginning 
> from the dirty offset, and then scanning forward from the beginning of the 
> log to decide which records should be retained based on whether they are 
> included in the map. The map of keys has a limited size. As soon as it fills 
> up, we stop building it. The offset corresponding to the last record that was 
> included in the map becomes the next dirty offset. Then when we are cleaning, 
> we stop scanning forward at the dirty offset. Or to be more precise, we 
> continue scanning until the end of the segment which includes the dirty 
> offset, but all records above that offset are coped as is without checking 
> the map of active keys. 
> Compaction is complicated by the presence of transactions. The cleaner must 
> keep track of which transactions have data remaining so that it can tell when 
> it is safe to remove the respective markers. It works a bit like the 
> consumer. Before scanning a segment, the cleaner consults the aborted 
> transaction index to figure out which transactions have been aborted. All 
> other transactions are considered committed.
> The problem we have found is that the cleaner does not take into account the 
> range of offsets between the dirty offset and the end offset of the segment 
> containing it when querying ahead for aborted transactions. This means that 
> when the cleaner is scanning forward from the dirty offset, it does not have 
> the complete set of aborted transactions. The main consequence of this is 
> that abort markers associated with transactions which start within this range 
> of offsets become eligible for deletion even before the corresponding data 
> has been removed from the log.
> Here is an example. Suppose that the log contains the following entries:
> offset=0, key=a
> offset=1, key=b
> offset=2, COMMIT
> offset=3, key=c
> offset=4, key=d
> offset=5, COMMIT
> offset=6, key=b
> offset=7, ABORT
> Suppose we have an offset map which can only contain 2 keys and the dirty 
> offset starts at 0. The first time we scan forward, we will build a map with 
> keys a and b, which will allow us to move the dirty offset up to 3. Due to 
> the issue documented here, we will not detect the aborted transaction 
> starting at offset 6. But it will not be eligible for deletion on this round 
> of cleaning because it is bound by `delete.retention.ms`. Instead, our new 
> logic will set the deletion horizon for this batch based to the current time 
> plus the configured `delete.retention.ms`.
> offset=0, key=a
> offset=1, key=b
> offset=2, COMMIT
> offset=3, key=c
> offset=4, key=d
> offset=5, COMMIT
> offset=6, key=b
> offset=7, ABORT (deleteHorizon: N)
> Suppose that the time reaches N+1 before the next cleaning. We will begin 
> from the dirty offset of 3 and collect keys c and d before stopping at offset 
> 6. Again, we will not detect the aborted transaction beginning at offset 6 
> since it is out of the range. This time when we scan, the marker at offset 7 
> will be deleted because the transaction will be seen as empty and now the 
> deletion horizon has passed. So we end up with this state:
> offset=0, key=a
> offset=1, key=b
> offset=2, COMMIT
> offset=3, key=c
> offset=4, key=d
> offset=5, COMMIT
> offset=6, key=b
> Effectively it becomes a hanging transaction. The interesting thing is that 
> we might not even detect it. As far as the leader is concerned, it had 
> already completed that transaction, so it is not expecting any additional 
> markers. The transaction index would have been rewritten without the aborted 
> transaction when the log was cleaned, so any consumer fetching the data would 
> see the transaction as committed. On the other hand, if we did a reassignment 
> to a new replica, or if we had to rebuild the full log state during recovery, 
> then we would suddenly detect it.
> I am not sure how likely this scenario is in practice. I think it's fair to 
> say it is an extremely rare case. The cleaner has to fail to clean a full 
> segment at least two times and you still need enough time to pass for the 
> marker's deletion horizon to be reached. Perhaps it is possible if the 
> cardinality of keys is very 

[jira] [Created] (KAFKA-13727) Edge case in cleaner can result in premature removal of ABORT marker

2022-03-10 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13727:
---

 Summary: Edge case in cleaner can result in premature removal of 
ABORT marker
 Key: KAFKA-13727
 URL: https://issues.apache.org/jira/browse/KAFKA-13727
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


The log cleaner works by first building a map of the active keys beginning from 
the dirty offset, and then scanning forward from the beginning of the log to 
decide which records should be retained based on whether they are included in 
the map. The map of keys has a limited size. As soon as it fills up, we stop 
building it. The offset corresponding to the last record that was included in 
the map becomes the next dirty offset. Then when we are cleaning, we stop 
scanning forward at the dirty offset. Or to be more precise, we continue 
scanning until the end of the segment which includes the dirty offset, but all 
records above that offset are coped as is without checking the map of active 
keys. 

Compaction is complicated by the presence of transactions. The cleaner must 
keep track of which transactions have data remaining so that it can tell when 
it is safe to remove the respective markers. It works a bit like the consumer. 
Before scanning a segment, the cleaner consults the aborted transaction index 
to figure out which transactions have been aborted. All other transactions are 
considered committed.

The problem we have found is that the cleaner does not take into account the 
range of offsets between the dirty offset and the end offset of the segment 
containing it when querying ahead for aborted transactions. This means that 
when the cleaner is scanning forward from the dirty offset, it does not have 
the complete set of aborted transactions. The main consequence of this is that 
abort markers associated with transactions which start within this range of 
offsets become eligible for deletion even before the corresponding data has 
been removed from the log.

Here is an example. Suppose that the log contains the following entries:

offset=0, key=1

offset=1, key=2

offset=2, COMMIT

offset=3, key=3

offset=4, key=4

offset=5, COMMIT

offset=6, key=2

offset=7, ABORT

Suppose we have an offset map which can only contain 2 keys and the dirty 
offset starts at 0. The first time we scan forward, we will build a map with 
keys 1 and 2, which will allow us to move the dirty offset up to 3. Due to the 
issue documented here, we will not detect the aborted transaction starting at 
offset 6. But it will not be eligible for deletion on this round of cleaning 
because it is bound by `delete.retention.ms`. Instead, our new logic will set 
the deletion horizon for this batch based to the current time plus the 
configured `delete.retention.ms`.

offset=0, key=a

offset=1, key=b

offset=2, COMMIT

offset=3, key=c

offset=4, key=d

offset=5, COMMIT

offset=6, key=b

offset=7, ABORT (deleteHorizon: N)

Suppose that the time reaches N+1 before the next cleaning. We will begin from 
the dirty offset of 3 and collect keys c and d before stopping at offset 6. 
Again, we will not detect the aborted transaction beginning at offset 6 since 
it is out of the range. This time when we scan, the marker at offset 7 will be 
deleted because the transaction will be seen as empty and now the deletion 
horizon has passed. So we end up with this state:

offset=0, key=a

offset=1, key=b

offset=2, COMMIT

offset=3, key=c

offset=4, key=d

offset=5, COMMIT

offset=6, key=b

Effectively it becomes a hanging transaction. The interesting thing is that we 
might not even detect it. As far as the leader is concerned, it had already 
completed that transaction, so it is not expecting any additional markers. The 
transaction index would have been rewritten without the aborted transaction 
when the log was cleaned, so any consumer fetching the data would see the 
transaction as committed. On the other hand, if we did a reassignment to a new 
replica, or if we had to rebuild the full log state during recovery, then we 
would suddenly detect it.

I am not sure how likely this scenario is in practice. I think it's fair to say 
it is an extremely rare case. The cleaner has to fail to clean a full segment 
at least two times and you still need enough time to pass for the marker's 
deletion horizon to be reached. Perhaps it is possible if the cardinality of 
keys is very high and the configured memory limit for the cleaner is low.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13704) Include TopicId in kafka-topics describe output

2022-03-02 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13704.
-
Resolution: Duplicate

> Include TopicId in kafka-topics describe output
> ---
>
> Key: KAFKA-13704
> URL: https://issues.apache.org/jira/browse/KAFKA-13704
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Priority: Major
>
> It would be helpful if `kafka-topics --describe` displayed the TopicId when 
> we have it available.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13704) Include TopicId in kafka-topics describe output

2022-03-02 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13704:
---

 Summary: Include TopicId in kafka-topics describe output
 Key: KAFKA-13704
 URL: https://issues.apache.org/jira/browse/KAFKA-13704
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


It would be helpful if `kafka-topics --describe` displayed the TopicId when we 
have it available.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   7   8   9   10   >