[jira] [Resolved] (KAFKA-14174) Operation documentation for KRaft

2022-09-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14174.

Resolution: Fixed

> Operation documentation for KRaft
> -
>
> Key: KAFKA-14174
> URL: https://issues.apache.org/jira/browse/KAFKA-14174
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: documentation, kraft
>
> KRaft documentation for 3.3
>  # Disk recovery
>  # External controller is the recommended configuration. The majority of 
> integration tests don't run against co-located mode.
>  # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14265) Prefix ACLs may shadow other prefix ACLs

2022-09-29 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14265.

Resolution: Fixed

> Prefix ACLs may shadow other prefix ACLs
> 
>
> Key: KAFKA-14265
> URL: https://issues.apache.org/jira/browse/KAFKA-14265
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.1
>
>
> Prefix ACLs may shadow other prefix ACLs. Consider the case where we have 
> prefix ACLs for foobar, fooa, and f. If we were matching a resource named 
> "foobar", we'd start scanning at the foobar ACL, hit the fooa ACL, and stop 
> -- missing the f ACL.
> To fix this, we should re-scan for ACLs at the first divergence point (in 
> this case, f) whenever we hit a mismatch of this kind.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14259) BrokerRegistration#toString throws an exception, terminating metadata replay

2022-09-28 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14259.

Resolution: Fixed

> BrokerRegistration#toString throws an exception, terminating metadata replay
> 
>
> Key: KAFKA-14259
> URL: https://issues.apache.org/jira/browse/KAFKA-14259
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>
> BrokerRegistration#toString throws an exception, terminating metadata replay, 
> because the sorted() method is used on an entry set rather than a key set.
> {noformat}
> Caused by:
>   
>  
> java.util.concurrent.ExecutionException: 
> java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to 
> class java.lang.Comparable (java.util.HashMap$Node and java.lan
> g.Comparable are in module java.base of loader 'bootstrap')   
>   
>  
> at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>   
>   
> at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>   
>
> at kafka.server.BrokerServer.startup(BrokerServer.scala:846)  
>   
>  
> ... 147 more  
>   
>  
>   
>   
>  
> Caused by:
>   
>  
> java.lang.ClassCastException: class java.util.HashMap$Node cannot 
> be cast to class java.lang.Comparable (java.util.HashMap$Node and 
> java.lang.Comparable are in module java.base 
> of loader 'bootstrap')
>   
>  
> at 
> java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
>   
>  
> at 
> java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   
>   
> at java.base/java.util.TimSort.sort(TimSort.java:220) 
>   
>  
> at java.base/java.util.Arrays.sort(Arrays.java:1307)  
>   
>  
> at 
> java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:353)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
>   
>  
> at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>   
>   
> at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
>   
> 
> at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   
>  
> at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
>

[jira] [Resolved] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-26 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14207.

Resolution: Fixed

> Add a 6.10 section for KRaft
> 
>
> Key: KAFKA-14207
> URL: https://issues.apache.org/jira/browse/KAFKA-14207
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: documentation, kraft
>
> The section should talk about:
>  # Limitation
>  # Recommended deployment: external controller
>  # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14241) Implement the snapshot cleanup policy

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14241:
--

 Summary: Implement the snapshot cleanup policy
 Key: KAFKA-14241
 URL: https://issues.apache.org/jira/browse/KAFKA-14241
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.4.0


It looks like delete policy needs to be set to either delete or compact:
{code:java}
        .define(CleanupPolicyProp, LIST, Defaults.CleanupPolicy, 
ValidList.in(LogConfig.Compact, LogConfig.Delete), MEDIUM, CompactDoc,
          KafkaConfig.LogCleanupPolicyProp)
{code}
Neither is correct for KRaft topics. KIP-630 talks about adding a third policy 
called snapshot:
{code:java}
The __cluster_metadata topic will have snapshot as the cleanup.policy. {code}
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-630%3A+Kafka+Raft+Snapshot#KIP630:KafkaRaftSnapshot-ProposedChanges]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14238) KRaft replicas can delete segments not included in a snapshot

2022-09-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14238:
--

 Summary: KRaft replicas can delete segments not included in a 
snapshot
 Key: KAFKA-14238
 URL: https://issues.apache.org/jira/browse/KAFKA-14238
 Project: Kafka
  Issue Type: Bug
  Components: core, kraft
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.3.0


We see this in the log
{code:java}
Deleting segment LogSegment(baseOffset=243864, size=9269150, 
lastModifiedTime=1662486784182, largestRecordTimestamp=Some(1662486784160)) due 
to retention time 60480ms breach based on the largest record timestamp in 
the segment {code}
This then cause {{KafkaRaftClient}} to throw an exception when sending batches 
to the listener:
{code:java}
 java.lang.IllegalStateException: Snapshot expected since next offset of 
org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@195461949 is 
0, log start offset is 369668 and high-watermark is 547379
at 
org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$4(KafkaRaftClient.java:312)
at java.base/java.util.Optional.orElseThrow(Optional.java:403)
at 
org.apache.kafka.raft.KafkaRaftClient.lambda$updateListenersProgress$5(KafkaRaftClient.java:311)
at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:165)
at 
org.apache.kafka.raft.KafkaRaftClient.updateListenersProgress(KafkaRaftClient.java:309){code}
The on disk state for the cluster metadata partition confirms this:
{code:java}
 ls __cluster_metadata-0/
00369668.index
00369668.log
00369668.timeindex
00503411.index
00503411.log
00503411.snapshot
00503411.timeindex
00548746.snapshot
leader-epoch-checkpoint
partition.metadata
quorum-state{code}
Noticed that there are no {{checkpoint}} files and the log doesn't have a 
segment at base offset 0.

This is happening because the {{LogConfig}} used for KRaft sets the retention 
policy to {{delete}} which causes the method {{deleteOldSegments}} to delete 
old segments even if there are no snaspshot for it. For KRaft, Kafka should 
only delete segment that breach the log start offset.

Log configuration for KRaft:
{code:java}
  val props = new Properties()
  props.put(LogConfig.MaxMessageBytesProp, 
config.maxBatchSizeInBytes.toString)
  props.put(LogConfig.SegmentBytesProp, Int.box(config.logSegmentBytes))
  props.put(LogConfig.SegmentMsProp, Long.box(config.logSegmentMillis))
  props.put(LogConfig.FileDeleteDelayMsProp, 
Int.box(Defaults.FileDeleteDelayMs))
  LogConfig.validateValues(props)
  val defaultLogConfig = LogConfig(props){code}
Segment deletion code:
{code:java}
 def deleteOldSegments(): Int = {
  if (config.delete) {
deleteLogStartOffsetBreachedSegments() +
  deleteRetentionSizeBreachedSegments() +
  deleteRetentionMsBreachedSegments()
  } else {
deleteLogStartOffsetBreachedSegments()
  }
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14073) Logging the reason for creating a snapshot

2022-09-13 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14073.

Resolution: Fixed

> Logging the reason for creating a snapshot
> --
>
> Key: KAFKA-14073
> URL: https://issues.apache.org/jira/browse/KAFKA-14073
> Project: Kafka
>  Issue Type: Improvement
>Reporter: dengziming
>Priority: Minor
>  Labels: kraft, newbie
>
> So far we have two reasons for creating a snapshot. 1. X bytes were applied. 
> 2. the metadata version changed. we should log the reason when creating 
> snapshot both in the broker side and controller side. see 
> https://github.com/apache/kafka/pull/12265#discussion_r915972383



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14222) Exhausted BatchMemoryPool

2022-09-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14222:
--

 Summary: Exhausted BatchMemoryPool
 Key: KAFKA-14222
 URL: https://issues.apache.org/jira/browse/KAFKA-14222
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0


For a large number of topics and partition the broker can encounter this issue:
{code:java}
[2022-09-12 14:14:42,114] ERROR [BrokerMetadataSnapshotter id=4] Unexpected 
error handling CreateSnapshotEvent 
(kafka.server.metadata.BrokerMetadataSnapshotter)
org.apache.kafka.raft.errors.BufferAllocationException: Append failed because 
we failed to allocate memory to write the batch
at 
org.apache.kafka.raft.internals.BatchAccumulator.append(BatchAccumulator.java:161)
at 
org.apache.kafka.raft.internals.BatchAccumulator.append(BatchAccumulator.java:112)
at 
org.apache.kafka.snapshot.RecordsSnapshotWriter.append(RecordsSnapshotWriter.java:167)
at 
kafka.server.metadata.RecordListConsumer.accept(BrokerMetadataSnapshotter.scala:49)
at 
kafka.server.metadata.RecordListConsumer.accept(BrokerMetadataSnapshotter.scala:42)
at org.apache.kafka.image.TopicImage.write(TopicImage.java:78)
at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:79)
at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:129)
at 
kafka.server.metadata.BrokerMetadataSnapshotter$CreateSnapshotEvent.run(BrokerMetadataSnapshotter.scala:116)
at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
at java.base/java.lang.Thread.run(Thread.java:829) {code}
This can happen because the snapshot is larger than {{{}5 * 8MB{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14204) QuorumController must correctly handle overly large batches

2022-09-08 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14204.

Resolution: Fixed

> QuorumController must correctly handle overly large batches
> ---
>
> Key: KAFKA-14204
> URL: https://issues.apache.org/jira/browse/KAFKA-14204
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, kraft
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14207) Add a 6.10 section for KRaft

2022-09-07 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14207:
--

 Summary: Add a 6.10 section for KRaft
 Key: KAFKA-14207
 URL: https://issues.apache.org/jira/browse/KAFKA-14207
 Project: Kafka
  Issue Type: Sub-task
  Components: documentation
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0


The section should talk about:
 # Limitation
 # Recommended deployment: external controller
 # How to start a KRaft cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14205) Document how to recover from kraft controller disk failure

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14205:
--

 Summary: Document how to recover from kraft controller disk failure
 Key: KAFKA-14205
 URL: https://issues.apache.org/jira/browse/KAFKA-14205
 Project: Kafka
  Issue Type: Sub-task
  Components: documentation
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14203) KRaft broker should disable snapshot generation after error replaying the metadata log

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14203:
--

 Summary: KRaft broker should disable snapshot generation after 
error replaying the metadata log
 Key: KAFKA-14203
 URL: https://issues.apache.org/jira/browse/KAFKA-14203
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.3.0
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.3.0


The broker skips records for which there was an error when replaying the log. 
This means that the MetadataImage has diverged from the state persistent in the 
log. The broker should disable snapshot generation else the next time a 
snapshot gets generated it will result in inconsistent data getting persisted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-09-06 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14179.

Fix Version/s: (was: 3.3.0)
   Resolution: Duplicate

> Improve docs/upgrade.html to talk about metadata.version upgrades
> -
>
> Key: KAFKA-14179
> URL: https://issues.apache.org/jira/browse/KAFKA-14179
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Blocker
>  Labels: documentation, kraft
>
> The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
> upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14188) Quickstart for KRaft

2022-08-29 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14188:
--

 Summary: Quickstart for KRaft
 Key: KAFKA-14188
 URL: https://issues.apache.org/jira/browse/KAFKA-14188
 Project: Kafka
  Issue Type: Task
  Components: documentation, kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


Either:
 # Improve the quick start documentation to talk about both KRAft and ZK
 # Create a KRaft quick start that is very similar to the ZK quick start but 
uses a different startup process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14183) Kraft bootstrap metadata file should use snapshot header/footer

2022-08-27 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14183.

Resolution: Fixed

> Kraft bootstrap metadata file should use snapshot header/footer
> ---
>
> Key: KAFKA-14183
> URL: https://issues.apache.org/jira/browse/KAFKA-14183
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.3.0
>
>
> The bootstrap checkpoint file that we use in kraft is intended to follow the 
> usual snapshot format, but currently it does not include the header/footer 
> control records. The main purpose of these at the moment is to set a version 
> for the checkpoint file itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14142) Improve information returned about the cluster metadata partition

2022-08-25 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-14142.

Resolution: Won't Fix

We discussed this and we decided that the kafka-metadata-quorum tool already 
returns enough information to determine this.

> Improve information returned about the cluster metadata partition
> -
>
> Key: KAFKA-14142
> URL: https://issues.apache.org/jira/browse/KAFKA-14142
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 3.3.0
>
>
> The Apacke Kafka operator needs to know when it is safe to format and start a 
> KRaft Controller that had a disk failure of the metadata log dir.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14179) Improve docs/upgrade.html to talk about metadata.version upgrades

2022-08-24 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14179:
--

 Summary: Improve docs/upgrade.html to talk about metadata.version 
upgrades
 Key: KAFKA-14179
 URL: https://issues.apache.org/jira/browse/KAFKA-14179
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.3.0


The rolling upgrade documentation for 3.3.0 only talks about software and IBP 
upgrades. It doesn't talk about metadata.version upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13911) Rate is calculated as NaN for minimum config values

2022-08-22 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13911.

  Reviewer: Ismael Juma
Resolution: Fixed

Closing as it was merged to trunk and 3.3.

> Rate is calculated as NaN for minimum config values
> ---
>
> Key: KAFKA-13911
> URL: https://issues.apache.org/jira/browse/KAFKA-13911
> Project: Kafka
>  Issue Type: Bug
>Reporter: Divij Vaidya
>Assignee: Divij Vaidya
>Priority: Minor
> Fix For: 3.3.0
>
>
> Implementation of connection creation rate quotas in Kafka is dependent on 
> two configurations:
>  # 
> [quota.window.num|https://kafka.apache.org/documentation.html#brokerconfigs_quota.window.num]
>  # 
> [quota.window.size.seconds|https://kafka.apache.org/documentation.html#brokerconfigs_quota.window.size.seconds]
> The minimum possible values of these configuration is 1 as per the 
> documentation. However, 1 as a minimum value for quota.window.num is invalid 
> and leads to failure for calculation of rate as demonstrated below.
> As a proof of the bug, the following unit test fails:
> {code:java}
> @Test
> public void testUseWithMinimumPossibleConfiguration() {
> final Rate r = new Rate();
> MetricConfig config = new MetricConfig().samples(1).timeWindow(1, 
> TimeUnit.SECONDS);
> Time elapsed = new MockTime();
> r.record(config, 1.0, elapsed.milliseconds());
> elapsed.sleep(100);
> r.record(config, 1.0, elapsed.milliseconds());
> elapsed.sleep(1000);
> final Double observedRate = r.measure(config, elapsed.milliseconds());
> assertFalse(Double.isNaN(observedRate));
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14174) Documentation for KRaft

2022-08-22 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14174:
--

 Summary: Documentation for KRaft
 Key: KAFKA-14174
 URL: https://issues.apache.org/jira/browse/KAFKA-14174
 Project: Kafka
  Issue Type: Improvement
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.3.0


KRaft documentation for 3.3
 # Disk recovery
 # Talk about KRaft operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-08-12 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13959.

Resolution: Fixed

> Controller should unfence Broker with busy metadata log
> ---
>
> Key: KAFKA-13959
> URL: https://issues.apache.org/jira/browse/KAFKA-13959
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.3.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: dengziming
>Priority: Blocker
> Fix For: 3.3.0
>
>
> https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
> for the controller to not unfence a broker if the committed offset keeps 
> increasing.
>  
> One solution to this problem is to require the broker to only catch up to the 
> last committed offset when they last sent the heartbeat. For example:
>  # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
> offset is {{{}X{}}}. The controller remember this last commit offset, call it 
> {{X'}}
>  # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence 
> the broker if {{Z >= X}} or {{{}Z >= X'{}}}.
> Another solution is to unfence the broker when the applied offset of the 
> broker has reached the offset of its own broker registration record.
> This change should also set the default for MetadataMaxIdleIntervalMs back to 
> 500.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14145) Faster propagation of high-watermark in KRaft topic partitions

2022-08-05 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14145:
--

 Summary: Faster propagation of high-watermark in KRaft topic 
partitions
 Key: KAFKA-14145
 URL: https://issues.apache.org/jira/browse/KAFKA-14145
 Project: Kafka
  Issue Type: Task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.4.0


Typically, the HWM is increase after one round of Fetch requests from the 
majority of the replicas. The HWM is propagated after another round of Fetch 
requests. If the LEO doesn't change the propagation of the HWM can be delay by 
one Fetch wait timeout (500ms).

Looking at the KafkaRaftClient implementation we would have to have an index 
for both the fetch offset and the last sent high-watermark for that replica.

Another issue here is that we changed the KafkaRaftManager so that it doesn't 
set the replica id when it is an observer/broker. Since the HWM is not part of 
the Fetch request the leader would have to keep track of this in the 
LeaderState.

 
val nodeId = if (config.processRoles.contains(ControllerRole)) \{
  OptionalInt.of(config.nodeId)
} else \{
  OptionalInt.empty()
}{{}}
We would need to find a better solution for 
https://issues.apache.org/jira/browse/KAFKA-13168 or improve the FETCH request 
so that it includes the HWM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14142) Improve information returned about the cluster metadata partition

2022-08-04 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-14142:
--

 Summary: Improve information returned about the cluster metadata 
partition
 Key: KAFKA-14142
 URL: https://issues.apache.org/jira/browse/KAFKA-14142
 Project: Kafka
  Issue Type: Improvement
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jason Gustafson
 Fix For: 3.3.0


The Apacke Kafka operator needs to know when it is safe to format and start a 
KRaft Controller that had a disk failure of the metadata log dir.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13968) Broker should not generator snapshot until been unfenced

2022-07-12 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13968.

Resolution: Fixed

> Broker should not generator snapshot until been unfenced
> 
>
> Key: KAFKA-13968
> URL: https://issues.apache.org/jira/browse/KAFKA-13968
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: dengziming
>Assignee: dengziming
>Priority: Blocker
> Fix For: 3.3.0
>
>
>  
> There is a bug when computing `FeaturesDelta` which cause us to generate 
> snapshot on every commit.
>  
> [2022-06-08 13:07:43,010] INFO [BrokerMetadataSnapshotter id=0] Creating a 
> new snapshot at offset 0... 
> (kafka.server.metadata.BrokerMetadataSnapshotter:66)
> [2022-06-08 13:07:43,222] INFO [BrokerMetadataSnapshotter id=0] Creating a 
> new snapshot at offset 2... 
> (kafka.server.metadata.BrokerMetadataSnapshotter:66)
> [2022-06-08 13:07:43,727] INFO [BrokerMetadataSnapshotter id=0] Creating a 
> new snapshot at offset 3... 
> (kafka.server.metadata.BrokerMetadataSnapshotter:66)
> [2022-06-08 13:07:44,228] INFO [BrokerMetadataSnapshotter id=0] Creating a 
> new snapshot at offset 4... 
> (kafka.server.metadata.BrokerMetadataSnapshotter:66)
>  
> Before a broker being unfenced, it won't starting publishing metadata, so 
> it's meaningless to  generate a snapshot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13955) Fix failing KRaftClusterTest tests

2022-06-08 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13955.

Resolution: Fixed

> Fix failing KRaftClusterTest tests
> --
>
> Key: KAFKA-13955
> URL: https://issues.apache.org/jira/browse/KAFKA-13955
> Project: Kafka
>  Issue Type: Test
>Reporter: Luke Chen
>Assignee: dengziming
>Priority: Major
>
> Tests are failing with timeout exception
> java.util.concurrent.TimeoutException: 
> testCreateClusterAndPerformReassignment() timed out after 120 seconds
>  
> Failing tests:
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testIncrementalAlterConfigs()
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testSetLog4jConfigurations()
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testLegacyAlterConfigs()
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testCreateClusterAndPerformReassignment()
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testUnregisterBroker()
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testCreateClusterAndCreateAndManyTopics()
> Build / JDK 8 and Scala 2.12 / 
> kafka.server.KRaftClusterTest.testCreateClusterAndCreateListDeleteTopic()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13959:
--

 Summary: Controller should unfence Broker with busy metadata log
 Key: KAFKA-13959
 URL: https://issues.apache.org/jira/browse/KAFKA-13959
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.3.0
Reporter: Jose Armando Garcia Sancio


https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
for the controller to not unfence a broker if the committed offset keeps 
increasing.

 

One solution to this problem is to require the broker to only catch up to the 
last committed offset when they last sent the heartbeat. For example:
 # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
offset is {{{}X{}}}. The controller remember this last commit offset, call it 
{{X'}}
 # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence the 
broker if {{Z >= X}} or {{{}Z >= X'{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13883) KIP-835: Monitor Quorum

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13883.

Resolution: Fixed

> KIP-835: Monitor Quorum
> ---
>
> Key: KAFKA-13883
> URL: https://issues.apache.org/jira/browse/KAFKA-13883
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>
> Tracking issue for the implementation of KIP-835.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13918) Schedule or cancel nooprecord write on metadata version change

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13918.

Resolution: Duplicate

> Schedule or cancel nooprecord write on metadata version change
> --
>
> Key: KAFKA-13918
> URL: https://issues.apache.org/jira/browse/KAFKA-13918
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13918) Schedule or cancel nooprecord write on metadata version change

2022-05-19 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13918:
--

 Summary: Schedule or cancel nooprecord write on metadata version 
change
 Key: KAFKA-13918
 URL: https://issues.apache.org/jira/browse/KAFKA-13918
 Project: Kafka
  Issue Type: Sub-task
  Components: controller
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13904) Move BrokerMetadataListener metrics to broker-metadata-metrics

2022-05-14 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13904:
--

 Summary: Move BrokerMetadataListener metrics to 
broker-metadata-metrics
 Key: KAFKA-13904
 URL: https://issues.apache.org/jira/browse/KAFKA-13904
 Project: Kafka
  Issue Type: Bug
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


The metrics in BorkerMetadataListener should be moved to the 
broker-metadata-metrics. This is okay because those metrics were never 
documented in a KIP and instead are now documented in KIP-835.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (KAFKA-13502) Support configuring BROKER_LOGGER on controller-only KRaft nodes

2022-05-13 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reopened KAFKA-13502:


I accidentally resolved this issue.

> Support configuring BROKER_LOGGER on controller-only KRaft nodes
> 
>
> Key: KAFKA-13502
> URL: https://issues.apache.org/jira/browse/KAFKA-13502
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Priority: Major
>  Labels: kip-500
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13884) KRaft Obsever are not required to flush on every append

2022-05-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13884:
--

 Summary: KRaft Obsever are not required to flush on every append
 Key: KAFKA-13884
 URL: https://issues.apache.org/jira/browse/KAFKA-13884
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


The current implementation of the KRaft Client flushes to disk when observers 
append to the log. This is not required since observer don't participate in 
leader election and the advancement of the high-watermark.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13883) KIP-835: Monitor Quorum

2022-05-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13883:
--

 Summary: KIP-835: Monitor Quorum
 Key: KAFKA-13883
 URL: https://issues.apache.org/jira/browse/KAFKA-13883
 Project: Kafka
  Issue Type: Improvement
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


Tracking issue for the implementation of KIP-835.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13806) Check CRC when reading snapshots

2022-04-07 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13806:
--

 Summary: Check CRC when reading snapshots
 Key: KAFKA-13806
 URL: https://issues.apache.org/jira/browse/KAFKA-13806
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13798) KafkaController should send LeaderAndIsr request when LeaderRecoveryState is altered

2022-04-04 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13798:
--

 Summary: KafkaController should send LeaderAndIsr request when 
LeaderRecoveryState is altered
 Key: KAFKA-13798
 URL: https://issues.apache.org/jira/browse/KAFKA-13798
 Project: Kafka
  Issue Type: Task
  Components: controller
Affects Versions: 3.2.0
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


The current implementation of KIP-704 and the ZK Controller only sends a 
LeaderAndIsr request to the followers if the AlterPartition completes an 
reassignment. That means that if there are no reassignment pending then the ZK 
Controller never sends a LeaderAndIsr request to the follower. The controller 
needs to send a LeaderAndIsr request when the partition has recovered because 
of "fetch from follower" feature.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13784) DescribeQuorum should return the current leader if the handling node is not the current leader

2022-03-30 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13784:
--

 Summary: DescribeQuorum should return the current leader if the 
handling node is not the current leader
 Key: KAFKA-13784
 URL: https://issues.apache.org/jira/browse/KAFKA-13784
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.2.0
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


For clients calling DescribeQuorum leader it is not possible for them to 
discover the current leader. If the request is sent to a node that is not the 
leader is simply replies with INVALID_REQUEST. KIP-595 mentions that it should 
instead reply with the current leader.

 

> f the response indicates that the intended node is not the current leader, 
>then check the response to see if the {{LeaderId}} has been set. If so, then 
>attempt to retry the request with the new leader.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13682) Implement auto preferred leader election in KRaft Controller

2022-03-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13682.

Resolution: Fixed

> Implement auto preferred leader election in KRaft Controller
> 
>
> Key: KAFKA-13682
> URL: https://issues.apache.org/jira/browse/KAFKA-13682
> Project: Kafka
>  Issue Type: Task
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: kip-500
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13587) Implement unclean leader election in KIP-704

2022-03-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13587.

Resolution: Fixed

> Implement unclean leader election in KIP-704
> 
>
> Key: KAFKA-13587
> URL: https://issues.apache.org/jira/browse/KAFKA-13587
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13754) Follower should reject Fetch request while the leader is recovering

2022-03-17 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13754:
--

 Summary: Follower should reject Fetch request while the leader is 
recovering
 Key: KAFKA-13754
 URL: https://issues.apache.org/jira/browse/KAFKA-13754
 Project: Kafka
  Issue Type: Task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


In the PR for KIP-704 we removed leader recovery state validation from the 
FETCH. This is okay because the leader immediately recovers the partition.

We should enable this validation before implementing log recovery from unclean 
leader election.

The old implementation and test is in this commit: 
https://github.com/apache/kafka/pull/11733/commits/c7e54b8f6cef087deac119d61a46d3586ead72b9



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13696) Topic partition leader should always send AlterPartition when transitioning from RECOVRING TO RECOVERD

2022-02-25 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13696:
--

 Summary: Topic partition leader should always send AlterPartition 
when transitioning from RECOVRING TO RECOVERD
 Key: KAFKA-13696
 URL: https://issues.apache.org/jira/browse/KAFKA-13696
 Project: Kafka
  Issue Type: Task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13682) Implement auto preferred leader electino in KRaft Controller

2022-02-22 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13682:
--

 Summary: Implement auto preferred leader electino in KRaft 
Controller
 Key: KAFKA-13682
 URL: https://issues.apache.org/jira/browse/KAFKA-13682
 Project: Kafka
  Issue Type: Task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13621) Resign leader on partition

2022-01-26 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13621:
--

 Summary: Resign leader on partition
 Key: KAFKA-13621
 URL: https://issues.apache.org/jira/browse/KAFKA-13621
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


h1. Motivation

If the current leader A at epoch X gets partition from the rest of the quorum, 
quorum voter A will stay leader at epoch X. This happens because voter A will 
never receive an request from the rest of the voters increasing the epoch. 
These requests that typically increase the epoch of past leaders are 
BeginQuorumEpoch and Vote.

In addition if voter A (leader at epoch X) doesn't get partition from the rest 
of the brokers (observer in the KRaft protocol) the brokers will never learn 
about the new quorum leader. This happens because 1. observers learn about the 
leader from the Fetch response and 2. observer send a Fetch request to a random 
leader if the Fetch request times out.

Neither of these two scenarios will cause the broker to send a request to a 
different voter because the leader at epoch X will never send a different 
leader in the response and the broker will never send a Fetch request to a 
different voter because the Fetch request will never timeout.
h1. Proposed Changes

In this scenario the A, the leader at epoch X, will stop receiving Fetch 
request from the majority of the voters. Voter A should resign as leader if the 
Fetch request from the majority of the voters is old enough. A reasonable value 
for "old enough" is the Fetch timeout value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13502) Support configuring BROKER_LOGGER on controller-only KRaft nodes

2022-01-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13502.

Resolution: Fixed

This issue was fixed by KAFKA-13552.

> Support configuring BROKER_LOGGER on controller-only KRaft nodes
> 
>
> Key: KAFKA-13502
> URL: https://issues.apache.org/jira/browse/KAFKA-13502
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Priority: Major
>  Labels: kip-500
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13552) Unable to dynamically change broker log levels on KRaft

2022-01-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13552.

Resolution: Fixed

> Unable to dynamically change broker log levels on KRaft
> ---
>
> Key: KAFKA-13552
> URL: https://issues.apache.org/jira/browse/KAFKA-13552
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.1.0, 3.0.0
>Reporter: Ron Dagostino
>Assignee: Colin McCabe
>Priority: Major
>
> It is currently not possible to dynamically change the log level in KRaft.  
> For example:
> kafka-configs.sh --bootstrap-server  --alter --add-config 
> "kafka.server.ReplicaManager=DEBUG" --entity-type broker-loggers 
> --entity-name 0
> Results in:
> org.apache.kafka.common.errors.InvalidRequestException: Unexpected resource 
> type BROKER_LOGGER.
> The code to process this request is in ZkAdminManager.alterLogLevelConfigs(). 
>  This needs to be moved out of there, and the functionality has to be 
> processed locally on the broker instead of being forwarded to the KRaft 
> controller.
> It is also an open question as to how we can dynamically alter log levels for 
> a remote KRaft controller.  Connecting directly to it is one possible 
> solution, but that may not be desirable since generally connecting directly 
> to the controller is not necessary.  The ticket for this particular spect of 
> the issue is https://issues.apache.org/jira/browse/KAFKA-13502



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13587) Implement unclean leader election in KIP-704

2022-01-10 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13587:
--

 Summary: Implement unclean leader election in KIP-704
 Key: KAFKA-13587
 URL: https://issues.apache.org/jira/browse/KAFKA-13587
 Project: Kafka
  Issue Type: Improvement
Reporter: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13489) Support different compression type for snapshots

2021-11-30 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13489:
--

 Summary: Support different compression type for snapshots
 Key: KAFKA-13489
 URL: https://issues.apache.org/jira/browse/KAFKA-13489
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12932) Interfaces for SnapshotReader and SnapshotWriter

2021-11-30 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12932.

Resolution: Fixed

> Interfaces for SnapshotReader and SnapshotWriter
> 
>
> Key: KAFKA-12932
> URL: https://issues.apache.org/jira/browse/KAFKA-12932
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jose Armando Garcia Sancio
>Assignee: loboxu
>Priority: Major
>
> Change the snapshot API so that SnapshotWriter and SnapshotReader are 
> interfaces. Change the existing types SnapshotWriter and SnapshotReader to 
> use a different name and to implement the interfaces introduced by this issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13357) Controller snapshot contains producer ids records but broker does not

2021-11-24 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13357.

Resolution: Fixed

> Controller snapshot contains producer ids records but broker does not
> -
>
> Key: KAFKA-13357
> URL: https://issues.apache.org/jira/browse/KAFKA-13357
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kraft
>Affects Versions: 3.0.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Blocker
>
> MetadataDelta ignores PRODUCER_IDS_RECORDS. A broker doesn't need this state 
> for its operation. The broker needs to handle this records if we want to hold 
> the invariant that controllers snapshots are equivalent to broker snapshots.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12973) Update KIP and dev mailing list

2021-11-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12973.

Resolution: Fixed

> Update KIP and dev mailing list
> ---
>
> Key: KAFKA-12973
> URL: https://issues.apache.org/jira/browse/KAFKA-12973
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>
> Update KIP-630 and the Kafka mailing list based on the small implementation 
> deviations from what is documented in the KIP.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13357) Controller snapshot contains producer ids records but broker does not

2021-10-06 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13357:
--

 Summary: Controller snapshot contains producer ids records but 
broker does not
 Key: KAFKA-13357
 URL: https://issues.apache.org/jira/browse/KAFKA-13357
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Affects Versions: 3.0.0
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


MetadataDelta ignores PRODUCER_IDS_RECORDS. A broker doesn't need this state 
for its operation. The broker needs to handle this records if we want to hold 
the invariant that controllers snapshots are equivalent to broker snapshots.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13321) Notify listener of leader change on registration

2021-09-23 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13321:
--

 Summary: Notify listener of leader change on registration
 Key: KAFKA-13321
 URL: https://issues.apache.org/jira/browse/KAFKA-13321
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


When a Listener is registered with the RaftClient, the RaftClient doesn't 
notify the listener of the current leader when it is an follower. The current 
implementation of RaftClient notifies this listener of the leader change if it 
is the current leader and it has caught up to the leader epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13208) Use TopicIdPartition instead of TopicPartition when computing the topic delta

2021-08-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13208:
--

 Summary: Use TopicIdPartition instead of TopicPartition when 
computing the topic delta
 Key: KAFKA-13208
 URL: https://issues.apache.org/jira/browse/KAFKA-13208
 Project: Kafka
  Issue Type: Improvement
  Components: kraft, replication
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


{{TopicPartition}} is used as the key when computing the local changes in 
{{TopicsDelta}}. The topic id is included in the Map value return by 
{{localChanges}}. I think that the handling of this code and the corresponding 
code in {{ReplicaManager}} could be simplified if {{localChanges}} instead 
returned something like
{code:java}
{
  deletes: Set[TopicIdPartition],
  leaders: Map[TopicIdPartition, PartitionRegistration],
  followers: Map[TopicIdPartition, PartitionRegistration] 
}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13198) TopicsDelta doesn't update deleted topic when processing PartitionChangeRecord

2021-08-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13198:
--

 Summary: TopicsDelta doesn't update deleted topic when processing 
PartitionChangeRecord
 Key: KAFKA-13198
 URL: https://issues.apache.org/jira/browse/KAFKA-13198
 Project: Kafka
  Issue Type: Bug
  Components: kraft, replication
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


In KRaft when a replica gets reassigned away from a topic partition we are not 
notifying the {{ReplicaManager}} to stop the replica.

On solution is to track those topic partition ids when processing 
{{PartitionChangeRecord}} and to returned them as {{deleted}} when the replica 
manager calls {{calculateDeltaChanges}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13193) Replica manager doesn't update partition state when transitioning from leader to follower with unknown leader

2021-08-11 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13193:
--

 Summary: Replica manager doesn't update partition state when 
transitioning from leader to follower with unknown leader
 Key: KAFKA-13193
 URL: https://issues.apache.org/jira/browse/KAFKA-13193
 Project: Kafka
  Issue Type: Bug
  Components: kraft, replication
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


This issue applies to both the ZK and KRaft implementation of the replica 
manager. In the rare case when a replica transition from leader to follower 
with no leader the partition state is not updated.

This is because when handling makeFollowers the ReplicaManager only updates the 
partition state if the leader is alive. The solution is to always transition to 
follower but not start the fetcher thread if the leader is unknown or not alive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13182) Input to AbstractFetchetManager::addFetcherForPartition could be simplified

2021-08-09 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13182:
--

 Summary: Input to AbstractFetchetManager::addFetcherForPartition 
could be simplified
 Key: KAFKA-13182
 URL: https://issues.apache.org/jira/browse/KAFKA-13182
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


The input to the addFetcherForPartition method in AbstractFetcherManager 
includes more information than it needs. The fetcher manager only needs the 
leader id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13181) ReplicaManager should start fetchers on UnfencedBrokerRecords

2021-08-09 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13181:
--

 Summary: ReplicaManager should start fetchers on 
UnfencedBrokerRecords
 Key: KAFKA-13181
 URL: https://issues.apache.org/jira/browse/KAFKA-13181
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft, replication
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


The Kraft ReplicaManager starts fetching from the leader if it is a follower 
and there is an endpoint for the leader.

Need to improve the ReplicaManager to also start fetching when the leader 
registers and gets unfenced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13168) KRaft observers should not have a replica id

2021-08-04 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13168:
--

 Summary: KRaft observers should not have a replica id
 Key: KAFKA-13168
 URL: https://issues.apache.org/jira/browse/KAFKA-13168
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.0.0


To avoid miss configuration of a broker affecting the quorum of the cluster 
metadata partition when a Kafka node is configure as broker only the replica id 
for the KRaft client should be set to {{Optional::empty()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13165) Validate node id, process role and quorum voters

2021-08-04 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13165:
--

 Summary: Validate node id, process role and quorum voters
 Key: KAFKA-13165
 URL: https://issues.apache.org/jira/browse/KAFKA-13165
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio


Under certain configuration is possible for the Kafka Server to boot up as a 
broker only but be the cluster metadata quorum leader. We should validate the 
configuration to avoid this case.
 # If the {{process.roles}} contains {{controller}} then the {{node.id}} needs 
to be in the {{controller.quorum.voters}}
 # If the {{process.roles}} doesn't contain {{controller}} then the {{node.id}} 
cannot be in the {{controller.quorum.voters}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12646) Implement snapshot generation on brokers

2021-08-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12646.

Resolution: Fixed

> Implement snapshot generation on brokers
> 
>
> Key: KAFKA-12646
> URL: https://issues.apache.org/jira/browse/KAFKA-12646
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Major
>  Labels: kip-500
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12647) Implement loading snapshot in the broker

2021-08-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12647.

Resolution: Fixed

> Implement loading snapshot in the broker
> 
>
> Key: KAFKA-12647
> URL: https://issues.apache.org/jira/browse/KAFKA-12647
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jose Armando Garcia Sancio
>Assignee: Colin McCabe
>Priority: Major
>  Labels: kip-500
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12997) Expose log record append time to the controller/broker

2021-08-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12997.

Resolution: Fixed

> Expose log record append time to the controller/broker
> --
>
> Key: KAFKA-12997
> URL: https://issues.apache.org/jira/browse/KAFKA-12997
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Niket Goel
>Assignee: Jose Armando Garcia Sancio
>Priority: Minor
>  Labels: kip-500
>
> The snapshot records are generated by each individual quorum participant 
> which also stamps the append time in the records. These appends times are 
> generated from a different clock (except in the case of the quorum leader) as 
> compared to the metadata log records (where timestamps are stamped by the 
> leader).
> To enable having a single clock to compare timestamps, 
> https://issues.apache.org/jira/browse/KAFKA-12952 adds a timestamp field to 
> the snapshot header which should contain the append time of the highest 
> record contained in the snapshot (which will be in leader time).
> This JIRA tracks exposing and wiring the batch timestamp such that it can be 
> provided to the SnapshotWriter at the time of snapshot creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13157) Kafka-dump-log needs to support snapshot records

2021-08-03 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13157:
--

 Summary: Kafka-dump-log needs to support snapshot records
 Key: KAFKA-13157
 URL: https://issues.apache.org/jira/browse/KAFKA-13157
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


Extends the kafka-dump-log tool to allow the user to view and print kraft 
snapshot files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13112) Controller's committed offset get out of sync with raft client listener context

2021-08-02 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13112.

Resolution: Fixed

Yes.

> Controller's committed offset get out of sync with raft client listener 
> context
> ---
>
> Key: KAFKA-13112
> URL: https://issues.apache.org/jira/browse/KAFKA-13112
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> The active controller creates an in-memory snapshot for every offset returned 
> by RaftClient::scheduleAppend and RaftClient::scheduleAtomicAppend. For 
> RaftClient::scheduleAppend, the RaftClient is free to split those records 
> into multiple batches. Because of this when scheduleAppend is use there is no 
> guarantee that the active leader will always have an in-memory snapshot for 
> every "last committed offset".
> To get around this problem, when the active controller renounces from leader 
> if there is no snapshot at the last committed offset it will instead.
>  # Reset the snapshot registry
>  # Unregister the listener from the RaftClient
>  # Register a new listener with the RaftClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13148) Kraft Controller doesn't handle scheduleAppend returning Long.MAX_VALUE

2021-07-28 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13148:
--

 Summary: Kraft Controller doesn't handle scheduleAppend returning 
Long.MAX_VALUE
 Key: KAFKA-13148
 URL: https://issues.apache.org/jira/browse/KAFKA-13148
 Project: Kafka
  Issue Type: Bug
  Components: controller, kraft
Reporter: Jose Armando Garcia Sancio


In some cases the RaftClient will return Long.MAX_VALUE:
{code:java}
  /**
   * Append a list of records to the log. The write will be scheduled for 
some time
   * in the future. There is no guarantee that appended records will be 
written to
   * the log and eventually committed. However, it is guaranteed that if 
any of the
   * records become committed, then all of them will be.
   *
   * If the provided current leader epoch does not match the current epoch, 
which
   * is possible when the state machine has yet to observe the epoch 
change, then
   * this method will return {@link Long#MAX_VALUE} to indicate an offset 
which is
   * not possible to become committed. The state machine is expected to 
discard all
   * uncommitted entries after observing an epoch change.
   *
   * @param epoch the current leader epoch
   * @param records the list of records to append
   * @return the expected offset of the last record; {@link Long#MAX_VALUE} 
if the records could
   * be committed; null if no memory could be allocated for the 
batch at this time
   * @throws org.apache.kafka.common.errors.RecordBatchTooLargeException if 
the size of the records is greater than the maximum
   * batch size; if this exception is throw none of the elements in 
records were
   * committed
   */
  Long scheduleAtomicAppend(int epoch, List records);
 {code}
The controller doesn't handle this case:
{code:java}
  // If the operation returned a batch of records, those 
records need to be
  // written before we can return our result to the user.  
Here, we hand off
  // the batch of records to the raft client.  They will be 
written out
  // asynchronously.
  final long offset;
  if (result.isAtomic()) {
  offset = raftClient.scheduleAtomicAppend(controllerEpoch, 
result.records());
  } else {
  offset = raftClient.scheduleAppend(controllerEpoch, 
result.records());
  }
  op.processBatchEndOffset(offset);
  writeOffset = offset;
  resultAndOffset = ControllerResultAndOffset.of(offset, 
result);
  for (ApiMessageAndVersion message : result.records()) {
  replay(message.message(), Optional.empty(), offset);
  }
  snapshotRegistry.getOrCreateSnapshot(offset);
  log.debug("Read-write operation {} will be completed when the 
log " +
  "reaches offset {}.", this, resultAndOffset.offset());
 {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13114) Unregsiter listener during renounce when the in-memory snapshot is missing

2021-07-20 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13114:
--

 Summary: Unregsiter listener during renounce when the in-memory 
snapshot is missing
 Key: KAFKA-13114
 URL: https://issues.apache.org/jira/browse/KAFKA-13114
 Project: Kafka
  Issue Type: Sub-task
  Components: controller
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


Need to improve the renounce logic to do the following when the last committer 
offset in-memory snapshot is missing:
 # Reset the snapshot registry
 # Unregister the listener from the RaftClient
 # Register the listener from the RaftClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13113) Add unregister support to the RaftClient.

2021-07-20 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13113:
--

 Summary: Add unregister support to the RaftClient.
 Key: KAFKA-13113
 URL: https://issues.apache.org/jira/browse/KAFKA-13113
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


Implement the following API:
{code:java}
interface RaftClient {
  ListenerContext register(Listener);
  void unregister(ListenerContext);
}

interface ListenerContext {
}

interface Listener {
  void handleCommit(ListenerContext, BatchReader);
  void handleSnapshot(ListenerContext, SnapshotReader);
  void handleLeaderChange(ListenerContext, LeaderAndEpoch);
} {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13112) Controller's committed offset get out of sync with raft client listener context

2021-07-20 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13112:
--

 Summary: Controller's committed offset get out of sync with raft 
client listener context
 Key: KAFKA-13112
 URL: https://issues.apache.org/jira/browse/KAFKA-13112
 Project: Kafka
  Issue Type: Bug
  Components: controller, kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13104) Controller should notify the RaftClient when it resigns

2021-07-19 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13104:
--

 Summary: Controller should notify the RaftClient when it resigns
 Key: KAFKA-13104
 URL: https://issues.apache.org/jira/browse/KAFKA-13104
 Project: Kafka
  Issue Type: Bug
  Components: controller, kraft
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.0.0


{code:java}
  private Throwable handleEventException(String name,
 Optional 
startProcessingTimeNs,
 Throwable exception) {
  ...
  renounce();
  return new UnknownServerException(exception);
  }
 {code}
When the active controller encounters an event exception it attempts to 
renounce leadership. Unfortunately, this doesn't tell the {{RaftClient}} that 
it should attempt to give up leadership. This will result in inconsistent state 
with the {{RaftClient}} as leader but with the controller as inactive.

We should change this implementation so that the active controller asks the 
{{RaftClient}} to resign. The active controller waits until 
{{handleLeaderChange}} before calling {{renounce()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13100) Controller cannot revert to an in-memory snapshot

2021-07-17 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13100:
--

 Summary: Controller cannot revert to an in-memory snapshot
 Key: KAFKA-13100
 URL: https://issues.apache.org/jira/browse/KAFKA-13100
 Project: Kafka
  Issue Type: Bug
  Components: controller, kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


{code:java}
  [2021-07-16 16:34:55,578] DEBUG [Controller 3002] Executing 
handleRenounce[3]. (org.apache.kafka.controller.QuorumController)
  [2021-07-16 16:34:55,578] WARN [Controller 3002] Renouncing the leadership at 
oldEpoch 3 due to a metadata log event. Reverting to last committed offset 214. 
(org.apache.kafka.controller.QuorumController)
  [2021-07-16 16:34:55,579] WARN [Controller 3002] 
org.apache.kafka.controller.QuorumController@646b1289: failed with unknown 
server exception RuntimeException at epoch -1 in 1510 us.  Reverting to last 
committed offset 214. (org.apache.kafka.controller.  QuorumController)
  java.lang.RuntimeException: No snapshot for epoch 214. Snapshot epochs are: 
-1, 1, 3, 5, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 94, 96, 97, 107, 108, 
112, 125, 126, 128, 135, 171, 208, 213
  at 
org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173)
  at 
org.apache.kafka.timeline.SnapshotRegistry.revertToSnapshot(SnapshotRegistry.java:203)
  at 
org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:784)
  at 
org.apache.kafka.controller.QuorumController.access$2500(QuorumController.java:121)
  at 
org.apache.kafka.controller.QuorumController$QuorumMetaLogListener.lambda$handleLeaderChange$3(QuorumController.java:769)
  at 
org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:311)
  at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
  at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
  at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
  at java.lang.Thread.run(Thread.java:748)
  [2021-07-16 16:34:55,580] ERROR [Controller 3002] Unexpected exception in 
handleException (org.apache.kafka.queue.KafkaEventQueue)
  java.lang.RuntimeException: No snapshot for epoch 214. Snapshot epochs are: 
-1, 1, 3, 5, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 94, 96, 97, 107, 108, 
112, 125, 126, 128, 135, 171, 208, 213
  at 
org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173)
  at 
org.apache.kafka.timeline.SnapshotRegistry.revertToSnapshot(SnapshotRegistry.java:203)
  at 
org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:784)
  at 
org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:287)
  at 
org.apache.kafka.controller.QuorumController.access$500(QuorumController.java:121)
  at 
org.apache.kafka.controller.QuorumController$ControlEvent.handleException(QuorumController.java:317)

   at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:126)
  at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
  at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13098) No such file exception when recovering snapshots in metadata log dir

2021-07-17 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13098.

Resolution: Fixed

> No such file exception when recovering snapshots in metadata log dir
> 
>
> Key: KAFKA-13098
> URL: https://issues.apache.org/jira/browse/KAFKA-13098
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> {code:java}
> RaftClusterTest > testCreateClusterAndCreateListDeleteTopic() FAILED
> java.io.UncheckedIOException: java.nio.file.NoSuchFileException: 
> /tmp/kafka-286994548094074875/broker_0_data0/@metadata-0/partition.metadata.tmp
> at 
> java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88)
> at java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104)
> at java.util.Iterator.forEachRemaining(Iterator.java:115)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) 
>   
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
>   
>   
>
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> at 
> kafka.raft.KafkaMetadataLog$.recoverSnapshots(KafkaMetadataLog.scala:616)
> at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:583) 
>  
> at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:257)
> at kafka.raft.KafkaRaftManager.(RaftManager.scala:132)  
>   
>   
>
> at 
> kafka.testkit.KafkaClusterTestKit$Builder.build(KafkaClusterTestKit.java:227)
> at 
> kafka.server.RaftClusterTest.testCreateClusterAndCreateListDeleteTopic(RaftClusterTest.scala:87)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13098) No such file exception when recovering snapshots in metadata log dir

2021-07-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13098:
--

 Summary: No such file exception when recovering snapshots in 
metadata log dir
 Key: KAFKA-13098
 URL: https://issues.apache.org/jira/browse/KAFKA-13098
 Project: Kafka
  Issue Type: Bug
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


{code:java}
RaftClusterTest > testCreateClusterAndCreateListDeleteTopic() FAILED
java.io.UncheckedIOException: java.nio.file.NoSuchFileException: 
/tmp/kafka-286994548094074875/broker_0_data0/@metadata-0/partition.metadata.tmp
at 
java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88)
at java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)   

at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)


   
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at 
kafka.raft.KafkaMetadataLog$.recoverSnapshots(KafkaMetadataLog.scala:616)
at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:583)   
   
at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:257)
at kafka.raft.KafkaRaftManager.(RaftManager.scala:132)


 
at 
kafka.testkit.KafkaClusterTestKit$Builder.build(KafkaClusterTestKit.java:227)
at 
kafka.server.RaftClusterTest.testCreateClusterAndCreateListDeleteTopic(RaftClusterTest.scala:87)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13078) Closing FileRawSnapshotWriter too early

2021-07-14 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13078.

Resolution: Fixed

> Closing FileRawSnapshotWriter too early
> ---
>
> Key: KAFKA-13078
> URL: https://issues.apache.org/jira/browse/KAFKA-13078
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.0.0
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> We are getting the following error
> {code:java}
>   [2021-07-13 17:23:42,174] ERROR [kafka-raft-io-thread]: Error due to 
> (kafka.raft.KafkaRaftManager$RaftIoThread)
>   java.io.UncheckedIOException: Error calculating snapshot size. temp path = 
> /mnt/kafka/kafka-metadata-logs/@metadata-0/0062-02-3249768281228588378.checkpoint.part,
>  snapshotId = OffsetAndEpoch(offset=62, epoch=2).
>   at 
> org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:63)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.maybeSendFetchOrFetchSnapshot(KafkaRaftClient.java:2044)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.pollFollowerAsObserver(KafkaRaftClient.java:2032)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.pollFollower(KafkaRaftClient.java:1995)
>   at 
> org.apache.kafka.raft.KafkaRaftClient.pollCurrentState(KafkaRaftClient.java:2104)
>   at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2217)
>   at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
>   Caused by: java.nio.channels.ClosedChannelException
>   at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
>   at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300)
>   at 
> org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:60)
>   ... 7 more
>  {code}
> This is because the {{FollowerState}} is closing the snapshot write passed 
> through the argument instead of the one being replaced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13080) Fetch snapshot request are not directed to kraft in controller

2021-07-14 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13080.

Resolution: Fixed

> Fetch snapshot request are not directed to kraft in controller
> --
>
> Key: KAFKA-13080
> URL: https://issues.apache.org/jira/browse/KAFKA-13080
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Blocker
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> Kraft followers and observer are seeing the following error
> {code:java}
> [2021-07-13 18:15:47,289] ERROR [RaftManager nodeId=2] Unexpected error 
> UNKNOWN_SERVER_ERROR in FETCH_SNAPSHOT response: 
> InboundResponse(correlationId=29862, 
> data=FetchSnapshotResponseData(throttleTimeMs=0, errorCode=-1, topics=[]), 
> sourceId=3001) (org.apache.kafka.raft.KafkaRaftClient) {code}
> This is because ControllerApis is not directing FetchSnapshost request to the 
> raft manager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13090) Improve cluster snapshot integration test

2021-07-14 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13090:
--

 Summary: Improve cluster snapshot integration test
 Key: KAFKA-13090
 URL: https://issues.apache.org/jira/browse/KAFKA-13090
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


Extends the test in RaftClusterSnapshotTest to verify that both the controllers 
and brokers are generating snapshots.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13089) Revisit the usage of BufferSuppliers in Kraft

2021-07-14 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13089:
--

 Summary: Revisit the usage of BufferSuppliers in Kraft
 Key: KAFKA-13089
 URL: https://issues.apache.org/jira/browse/KAFKA-13089
 Project: Kafka
  Issue Type: Sub-task
  Components: kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio


The latest KafkaRaftClient creates a new BufferSupplier every time it is 
needed. A buffer supplier is needed when reading from the log and when reading 
from a snapshot.

It would be good to investigate if there is a performance and memory usage 
advantage of sharing the buffer supplier between those use cases and every time 
the log or snapshot are read.

If BufferSupplier is share, it is very likely that the implementation will have 
to be thread-safe because we need support multiple Listeners and each Listener 
would be using a different thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13080) Fetch snapshot request are not directed to kraft in controller

2021-07-13 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13080:
--

 Summary: Fetch snapshot request are not directed to kraft in 
controller
 Key: KAFKA-13080
 URL: https://issues.apache.org/jira/browse/KAFKA-13080
 Project: Kafka
  Issue Type: Bug
  Components: controller, kraft
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


Kraft followers and observer are seeing the following error
{code:java}
[2021-07-13 18:15:47,289] ERROR [RaftManager nodeId=2] Unexpected error 
UNKNOWN_SERVER_ERROR in FETCH_SNAPSHOT response: 
InboundResponse(correlationId=29862, 
data=FetchSnapshotResponseData(throttleTimeMs=0, errorCode=-1, topics=[]), 
sourceId=3001) (org.apache.kafka.raft.KafkaRaftClient) {code}
This is because ControllerApis is not directing FetchSnapshost request to the 
raft manager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13078) Closing FileRawSnapshotWriter too early

2021-07-13 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13078:
--

 Summary: Closing FileRawSnapshotWriter too early
 Key: KAFKA-13078
 URL: https://issues.apache.org/jira/browse/KAFKA-13078
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.0.0
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


We are getting the following error
{code:java}
  [2021-07-13 17:23:42,174] ERROR [kafka-raft-io-thread]: Error due to 
(kafka.raft.KafkaRaftManager$RaftIoThread)
  java.io.UncheckedIOException: Error calculating snapshot size. temp path = 
/mnt/kafka/kafka-metadata-logs/@metadata-0/0062-02-3249768281228588378.checkpoint.part,
 snapshotId = OffsetAndEpoch(offset=62, epoch=2).
  at 
org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:63)
  at 
org.apache.kafka.raft.KafkaRaftClient.maybeSendFetchOrFetchSnapshot(KafkaRaftClient.java:2044)
  at 
org.apache.kafka.raft.KafkaRaftClient.pollFollowerAsObserver(KafkaRaftClient.java:2032)
  at 
org.apache.kafka.raft.KafkaRaftClient.pollFollower(KafkaRaftClient.java:1995)
  at 
org.apache.kafka.raft.KafkaRaftClient.pollCurrentState(KafkaRaftClient.java:2104)
  at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2217)
  at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52)
  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
  Caused by: java.nio.channels.ClosedChannelException
  at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
  at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:300)
  at 
org.apache.kafka.snapshot.FileRawSnapshotWriter.sizeInBytes(FileRawSnapshotWriter.java:60)
  ... 7 more
 {code}
This is because the {{FollowerState}} is closing the snapshot write passed 
through the argument instead of the one being replaced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13074) Implement mayClean for MockLog

2021-07-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13074:
--

 Summary: Implement mayClean for MockLog
 Key: KAFKA-13074
 URL: https://issues.apache.org/jira/browse/KAFKA-13074
 Project: Kafka
  Issue Type: Bug
Reporter: Jose Armando Garcia Sancio


The current implement of MockLog doesn't implement maybeClean. It is expected 
that MockLog has the same semantic as KafkaMetadataLog. This is assumed to be 
true for a few of the tests suite like the raft simulation and the kafka raft 
client test context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13073) Simulation test fails due to inconsistency in MockLog's implementation

2021-07-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13073:
--

 Summary: Simulation test fails due to inconsistency in MockLog's 
implementation
 Key: KAFKA-13073
 URL: https://issues.apache.org/jira/browse/KAFKA-13073
 Project: Kafka
  Issue Type: Bug
  Components: controller, replication
Affects Versions: 3.0.0
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


We are getting the following error on trunk
{code:java}
RaftEventSimulationTest > canRecoverAfterAllNodesKilled STANDARD_OUT
timestamp = 2021-07-12T16:26:55.663, 
RaftEventSimulationTest:canRecoverAfterAllNodesKilled =
  java.lang.RuntimeException:
Uncaught exception during poll of node 1
  |---jqwik---
tries = 25| # of calls to property
checks = 25   | # of not rejected calls
generation = RANDOMIZED   | parameters are randomly generated
after-failure = PREVIOUS_SEED | use the previous seed
when-fixed-seed = ALLOW   | fixing the random seed is allowed
edge-cases#mode = MIXIN   | edge cases are mixed in
edge-cases#total = 108| # of all combined edge cases
edge-cases#tried = 4  | # of edge cases tried in current run
seed = 8079861963960994566| random seed to reproduce generated values   
 Sample
--
  arg0: 4002
  arg1: 2
  arg2: 4{code}
I think there are a couple of issues here:
 # The {{ListenerContext}} for {{KafkaRaftClient}} uses the value returned by 
{{ReplicatedLog::startOffset()}} to determined the log start and when to load a 
snapshot while the {{MockLog}} implementation uses {{logStartOffset}} which 
could be a different value.
 # {{MockLog}} doesn't implement {{ReplicatedLog::maybeClean}} so the log start 
offset is always 0.
 # The snapshot id validation for {{MockLog}} and {{KafkaMetadataLog}}'s 
{{createNewSnapshot}} throws an exception when the snapshot id is less than the 
log start offset.

Solutions:

Fix the error quoted above we only need to fix bullet point 3. but I think we 
should fix all of the issues enumerated in this Jira.

For 1. we should change the {{MockLog}} implementation so that it uses 
{{startOffset}} both externally and internally.

For 2. I will file another issue to track this implementation.

For 3. I think this validation is too strict. I think it is safe to simply 
ignore any attempt by the state machine to create an snapshot with an id less 
that the log start offset. We should return a {{Optional.empty()}}when the 
snapshot id is less than the log start offset. This tells the user that it 
doesn't need to generate a snapshot for that offset. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12974) Change the default for snapshot generation configuration

2021-07-01 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12974.

Resolution: Fixed

Already fixed. Default set to 20MB.

> Change the default for snapshot generation configuration
> 
>
> Key: KAFKA-12974
> URL: https://issues.apache.org/jira/browse/KAFKA-12974
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Jose Armando Garcia Sancio
>Priority: Blocker
>
> In PR https://github.com/apache/kafka/pull/10812 the default for the 
> {{metadata.log.snapshot.min.new_record.bytes}} is set to {{Int.MaxValue}}. 
> This was done to disable snapshot generation by default since snapshot 
> loading is not implemented on the broker.
> This value should be changed to something much smaller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12863) Configure controller snapshot generation

2021-07-01 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12863.

Fix Version/s: 3.0.0
   Resolution: Fixed

> Configure controller snapshot generation
> 
>
> Key: KAFKA-12863
> URL: https://issues.apache.org/jira/browse/KAFKA-12863
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: kip-500
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12952) Metadata Snapshot File Delimiters

2021-07-01 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12952.

Fix Version/s: 3.0.0
   Resolution: Fixed

> Metadata Snapshot File Delimiters
> -
>
> Key: KAFKA-12952
> URL: https://issues.apache.org/jira/browse/KAFKA-12952
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller, kraft
>Reporter: Niket Goel
>Assignee: Niket Goel
>Priority: Minor
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> Create new Control Records that will serve as the header and footer for a 
> Metadata Snapshot File. These records will be contained at the beginning and 
> end of each Snapshot File, and can be checked to verify completeness of a 
> snapshot file.
> The following fields are proposed for the Header:
>  # *Version :* Schema version for the snapshot header
>  # *Last Contained Log Time* : The append time of the highest record 
> contained in this snapshot
>  # *End Offset* : End offset of the snapshot from the snapshot ID
>  # *Epoch :* Epoch of the snapshot ** from the Snapshot ID**
>  # *Creator ID* : (Optional) ID of the broker/Controller that created the 
> snapshot
>  # *Cluster ID :* (Optional) ID of the cluster that created the snapshot
>  # *Create Time :* Timestamp of the snapshot creation (might not be needed as 
> each record batch has a timestamp already.
> The following fields are proposed for the footer:
>  # *Version* : Schema version of the snapshot footer (same as header)
>  # *Record Type* : A type fields indicating this is the end record for the 
> snapshot file.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13020) SnapshotReader should decode and repor the append time in the header

2021-06-30 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13020:
--

 Summary: SnapshotReader should decode and repor the append time in 
the header
 Key: KAFKA-13020
 URL: https://issues.apache.org/jira/browse/KAFKA-13020
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13006) Remove the method RaftClient.leaderAndEpoch

2021-06-28 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13006:
--

 Summary: Remove the method RaftClient.leaderAndEpoch
 Key: KAFKA-13006
 URL: https://issues.apache.org/jira/browse/KAFKA-13006
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


The are semantic differences between {{RaftClient.leaderAndEpoch}} and 
{{RaftClient.Listener.handleLeaderChange}} specially when the raft client 
transition from follower to leader. To simplify the API, I think that we should 
remove the method {{RaftClient.leaderAndEpoch}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12992) Make kraft configuration properties public

2021-06-24 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12992:
--

 Summary: Make kraft configuration properties public
 Key: KAFKA-12992
 URL: https://issues.apache.org/jira/browse/KAFKA-12992
 Project: Kafka
  Issue Type: Sub-task
  Components: core
Reporter: Jose Armando Garcia Sancio
 Fix For: 3.0.0


All of the Kraft configurations should be made public:
{code:java}
/*
 * KRaft mode configs. Note that these configs are defined as internal. 
We will make them public in the 3.0.0 release.
 */
.defineInternal(ProcessRolesProp, LIST, Collections.emptyList(), 
ValidList.in("broker", "controller"), HIGH, ProcessRolesDoc)
.defineInternal(NodeIdProp, INT, Defaults.EmptyNodeId, null, HIGH, 
NodeIdDoc)
.defineInternal(InitialBrokerRegistrationTimeoutMsProp, INT, 
Defaults.InitialBrokerRegistrationTimeoutMs, null, MEDIUM, 
InitialBrokerRegistrationTimeoutMsDoc)
.defineInternal(BrokerHeartbeatIntervalMsProp, INT, 
Defaults.BrokerHeartbeatIntervalMs, null, MEDIUM, BrokerHeartbeatIntervalMsDoc)
.defineInternal(BrokerSessionTimeoutMsProp, INT, 
Defaults.BrokerSessionTimeoutMs, null, MEDIUM, BrokerSessionTimeoutMsDoc)
.defineInternal(MetadataLogDirProp, STRING, null, null, HIGH, 
MetadataLogDirDoc)
.defineInternal(ControllerListenerNamesProp, STRING, null, null, HIGH, 
ControllerListenerNamesDoc)
.defineInternal(SaslMechanismControllerProtocolProp, STRING, 
SaslConfigs.DEFAULT_SASL_MECHANISM, null, HIGH, 
SaslMechanismControllerProtocolDoc)
 {code}
 

https://github.com/apache/kafka/blob/2beaf9a720330615bc5474ec079f8b4b105eff91/core/src/main/scala/kafka/server/KafkaConfig.scala#L1043-L1053



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12982) Notify listeners of raft client shutdowns

2021-06-22 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12982:
--

 Summary: Notify listeners of raft client shutdowns
 Key: KAFKA-12982
 URL: https://issues.apache.org/jira/browse/KAFKA-12982
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


`RaftClient.Listener.beginShutdown` should be called when the `RaftClient` is 
shutting down. I think there should be two ways to terminate the `RaftClient`. 
Those are `shutdown` and `close`.

It looks like the current code for `close` only closes the metrics registry. It 
doesn't notify the listeners that the raft client was close. It doesn't stop 
future `poll` from updating the raft client.

There is also an assumption that `shutdown` can only be called once. I think to 
satisfy this we should remove this method from `RaftClient` and keep it as 
implementation method in `KafkaRaftClient`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12974) Change the default for snapshot generation configuration

2021-06-21 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12974:
--

 Summary: Change the default for snapshot generation configuration
 Key: KAFKA-12974
 URL: https://issues.apache.org/jira/browse/KAFKA-12974
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


In PR https://github.com/apache/kafka/pull/10812 the default for the 
{metadata.log.snapshot.min.new_record.bytes} is set to {Int.MaxValue}. This was 
done to disable snapshot generation by default since snapshot loading is not 
implemented on the broker.

This value should be changed to something much smaller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12973) Update KIP and dev mailing list

2021-06-21 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12973:
--

 Summary: Update KIP and dev mailing list
 Key: KAFKA-12973
 URL: https://issues.apache.org/jira/browse/KAFKA-12973
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


Update KIP-630 and the Kafka mailing list based on the small implementation 
deviations from what is documented in the KIP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12968) Add integration tests for "test-kraft-server-start"

2021-06-17 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12968:
--

 Summary: Add integration tests for "test-kraft-server-start"
 Key: KAFKA-12968
 URL: https://issues.apache.org/jira/browse/KAFKA-12968
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12958) Add similation invariant for leadership and snapshot

2021-06-16 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12958:
--

 Summary: Add similation invariant for leadership and snapshot
 Key: KAFKA-12958
 URL: https://issues.apache.org/jira/browse/KAFKA-12958
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


During the simulation we should add an invariant that notified leaders are 
never asked to load snapshots. The state machine always sees the following 
sequence of callback calls:

Leaders see:
...
handleLeaderChange state machine is notify of leadership
handleSnapshot is never called

Non-leader see:
...
handleLeaderChange state machine is notify that is not leader
handleSnapshot is called 0 or more times



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12932) Interfaces for SnapshotReader and SnapshotWriter

2021-06-10 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12932:
--

 Summary: Interfaces for SnapshotReader and SnapshotWriter
 Key: KAFKA-12932
 URL: https://issues.apache.org/jira/browse/KAFKA-12932
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


Change the snapshot API so that SnapshotWriter and SnapshotReader are 
interfaces. Change the existing types SnapshotWriter and SnapshotReader to use 
a different name and to implement the interfaces introduced by this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12908) Load snapshot heuristic

2021-06-07 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12908:
--

 Summary: Load snapshot heuristic
 Key: KAFKA-12908
 URL: https://issues.apache.org/jira/browse/KAFKA-12908
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


The {{KafkaRaftCient}} implementation only forces the {{RaftClient.Listener}} 
to load a snapshot only when the listener's next offset is less than the start 
offset.

This is technically correct but in some cases it may be more efficient to load 
a snapshot even when the next offset exists in the log. This is clearly true 
when the latest snapshot has less entries than the number of records from the 
next offset to the latest snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12873) Log truncation due to divergence should also remove snapshots

2021-06-01 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12873:
--

 Summary: Log truncation due to divergence should also remove 
snapshots
 Key: KAFKA-12873
 URL: https://issues.apache.org/jira/browse/KAFKA-12873
 Project: Kafka
  Issue Type: Sub-task
  Components: log
Reporter: Jose Armando Garcia Sancio


It should not be possible for log truncation to truncate past the 
high-watermark and we know that snapshots are less than the high-watermark.

Having said that I think we should add code that removes any snapshot that is 
greater than the log end offset after a log truncation. Currently the code that 
does log truncation is in `KafkaMetadataLog::truncateTo`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12863) Configure controller snapshot generation

2021-05-28 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12863:
--

 Summary: Configure controller snapshot generation
 Key: KAFKA-12863
 URL: https://issues.apache.org/jira/browse/KAFKA-12863
 Project: Kafka
  Issue Type: Sub-task
  Components: controller
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12837) Process entire batch in broker metadata listener

2021-05-21 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12837:
--

 Summary: Process entire batch in broker metadata listener
 Key: KAFKA-12837
 URL: https://issues.apache.org/jira/browse/KAFKA-12837
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


The currently BrokerMetadataListener process one batch at a time even thought 
it is possible for the BatchReader to contain more than one batch. This is 
functionally correct but it would required less coordination between the 
RaftIOThread and the broker metadata listener thread if the broker is changed 
to process all of the batches included in the BatchReader sent through 
handleCommit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12342) Get rid of raft/meta log shim layer

2021-05-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12342.

Resolution: Fixed

> Get rid of raft/meta log shim layer
> ---
>
> Key: KAFKA-12342
> URL: https://issues.apache.org/jira/browse/KAFKA-12342
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>  Labels: kip-500
>
> We currently use a shim to bridge the interface differences between 
> `RaftClient` and `MetaLogManager`. We need to converge the two interfaces and 
> get rid of the shim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12543) Re-design the ownership model for snapshots

2021-05-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-12543.

Resolution: Fixed

> Re-design the ownership model for snapshots
> ---
>
> Key: KAFKA-12543
> URL: https://issues.apache.org/jira/browse/KAFKA-12543
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>
> With the current implementation, {{RawSnapshotReader}} are created and closed 
> by the {{KafkaRaftClient}} as needed to satisfy {{FetchSnapshot}} requests. 
> This means that for {{FileRawSnapshotReader}} they are closed before the 
> network client has had a chance to send the bytes over the network.
> One way to fix this is to make the {{KafkaMetadataLog}} the owner of the 
> {{FileRawSnapshotReader}}. Once a {{FileRawSnapshotReader}} is created it 
> will stay open until the snapshot is deleted by 
> {{ReplicatedLog::deleteBeforeSnapshot}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12787) Configure and integrate controller snapshot with the RaftClient

2021-05-14 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12787:
--

 Summary: Configure and integrate controller snapshot with the 
RaftClient
 Key: KAFKA-12787
 URL: https://issues.apache.org/jira/browse/KAFKA-12787
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12773) Use UncheckedIOException when wrapping IOException

2021-05-11 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12773:
--

 Summary: Use UncheckedIOException when wrapping IOException
 Key: KAFKA-12773
 URL: https://issues.apache.org/jira/browse/KAFKA-12773
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jose Armando Garcia Sancio


Use UncheckedIOException when wrapping IOException instead of RuntimeException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12668) MockScheduler is not safe to use in concurrent code.

2021-04-14 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12668:
--

 Summary: MockScheduler is not safe to use in concurrent code.
 Key: KAFKA-12668
 URL: https://issues.apache.org/jira/browse/KAFKA-12668
 Project: Kafka
  Issue Type: Improvement
  Components: unit tests
Reporter: Jose Armando Garcia Sancio


The current implementation of MockScheduler executes tasks in the same stack 
when schedule is called. This violates Log's assumption since Log calls 
schedule while holding a lock. This can cause deadlock in tests.

One solution is to change MockSchedule schedule method so that tick is not 
called. tick should be called by a stack (thread) that doesn't hold any locks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12646) Implement loading snapshot in the controller

2021-04-09 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12646:
--

 Summary: Implement loading snapshot in the controller
 Key: KAFKA-12646
 URL: https://issues.apache.org/jira/browse/KAFKA-12646
 Project: Kafka
  Issue Type: Sub-task
  Components: controller
Reporter: Jose Armando Garcia Sancio






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >