[VOTE] KIP-855: Add schema.namespace parameter to SetSchemaMetadata SMT in Kafka Connect

2022-07-27 Thread Michael Negodaev
I would like to start a vote on my design to add "schema.namespace"
parameter in SetSchemaMetadata Single Message Transform in Kafka Connect.

KIP URL: https://cwiki.apache.org/confluence/x/CiT1D


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1102

2022-07-27 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 486773 lines...]
[2022-07-28T02:55:11.279Z] > Task :connect:api:testJar
[2022-07-28T02:55:11.279Z] > Task :connect:api:testSrcJar
[2022-07-28T02:55:11.279Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2022-07-28T02:55:11.279Z] > Task :connect:api:publishToMavenLocal
[2022-07-28T02:55:13.066Z] 
[2022-07-28T02:55:13.066Z] > Task :streams:javadoc
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:890:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:919:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:939:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:854:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:890:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:919:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:939:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:84:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:136:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:147:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:101:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:167:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:58:
 warning - Tag @link: missing '#': "org.apache.kafka.streams.StreamsBuilder()"
[2022-07-28T02:55:13.066Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:58:
 warning - Tag @link: can't find org.apache.kafka.streams.StreamsBuilder() in 
org.apache.kafka.streams.TopologyConfig
[2022-07-28T02:55:14.011Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyDescription.java:38:
 warning - Tag @link: reference not found: ProcessorContext#forward(Object, 
Object) forwards
[2022-07-28T02:55:14.011Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/Position.java:44:
 warning - Tag @link: can't find query(Query,
[2022-07-28T02:55:14.011Z]  PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-07-28T02:55:14.011Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:109:
 warning - Tag @link: reference not found: this#getResult()
[2022-07-28T02:55:14.011Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:116:
 warning - Tag @link: reference not found: this#getFailureReason()
[2022-07-28T02:55:14.011Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:116:
 warning - Tag @link: reference not found: this#getFailureMessage()
[2022-07-28T02:55:14.011Z] 

[RESULTS] [VOTE] Release Kafka version 3.2.1

2022-07-27 Thread David Arthur
The vote for RC3 has passed with eight +1 votes (three binding) and no -1
votes. Here are the results:

+1 votes
PMC:
* Randall Hauch
* Rajini Sivaram
* Bill Bejeck

Committers:
None

Community:
* Christopher Shannon
* Federico Valeri
* Dongjoon Hyun
* Jakub Scholz
* Matthew de Detrich

0 Votes:
None

-1 Votes:
None

Vote Thread:
https://lists.apache.org/thread/kcr2xncr762sqy79rbl83w0hzw85w775

I'll continue with the release process and send out the release
announcement over the next few days.

Thanks!
David Arthur


Re: [VOTE] 3.2.1 RC3

2022-07-27 Thread David Arthur
I'm closing out the vote now. Thanks to everyone who voted. The RC passed
with the required number of votes. I'll send out the results thread shortly.

Cheers,
David Arthur

On Wed, Jul 27, 2022 at 11:54 AM Bill Bejeck  wrote:

> Hi David,
>
> Thanks for running the release!
>
> I did the following steps:
>
>- Validated all signatures and checksums
>- Built from source
>- Ran all the unit tests
>- I spot-checked the doc.  I did notice the same version number as
>Randal - but I expect that will get fixed when the docs are updated with
>the release.
>
> +1(binding)
>
> Thanks,
> Bill
>
> On Tue, Jul 26, 2022 at 5:56 PM Matthew Benedict de Detrich
>  wrote:
>
> > Thanks for the RC,
> >
> > I ran the full (unit + integration) tests using Scala 2.12 and 2.13
> across
> > OpenJDK (Linux) 11 and 17 and all tests passed apart from a single one
> > which is documented at https://issues.apache.org/jira/browse/KAFKA-13514
> >
> > +1 (non binding)
> >
> >
> >
> > On Fri, Jul 22, 2022 at 3:15 AM David Arthur 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first release candidate of Apache Kafka 3.2.1.
> > >
> > > This is a bugfix release with several fixes since the release of
> 3.2.0. A
> > > few of the major issues include:
> > >
> > > * KAFKA-14062 OAuth client token refresh fails with SASL extensions
> > > * KAFKA-14079 Memory leak in connectors using errors.tolerance=all
> > > * KAFKA-14024 Cooperative rebalance regression causing clients to get
> > stuck
> > >
> > >
> > > Release notes for the 3.2.1 release:
> > >
> https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/RELEASE_NOTES.html
> > >
> > >
> > >
> > >  Please download, test and vote by Wednesday July 27, 2022 at 17:00
> > PT.
> > > 
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/
> > >
> > > Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > Javadoc: https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/javadoc/
> > >
> > > Tag to be voted upon (off 3.2 branch) is the 3.2.1 tag:
> > > https://github.com/apache/kafka/releases/tag/3.2.1-rc3
> > >
> > > Documentation: https://kafka.apache.org/32/documentation.html
> > >
> > > Protocol: https://kafka.apache.org/32/protocol.html
> > >
> > >
> > > The past few builds have had flaky test failures. I will update this
> > thread
> > > with passing build links soon.
> > >
> > > Unit/Integration test job:
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.2/
> > > System test job:
> > > https://jenkins.confluent.io/job/system-test-kafka/job/3.2/
> > >
> > >
> > > Thanks!
> > > David Arthur
> > >
> >
> >
> > --
> >
> > Matthew de Detrich
> >
> > *Aiven Deutschland GmbH*
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > *m:* +491603708037
> >
> > *w:* aiven.io *e:* matthew.dedetr...@aiven.io
> >
>


[jira] [Resolved] (KAFKA-14119) Sensor in metrics has potential thread safety issues

2022-07-27 Thread Eric Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wu resolved KAFKA-14119.
-
Resolution: Not A Bug

> Sensor in metrics has potential thread safety issues
> 
>
> Key: KAFKA-14119
> URL: https://issues.apache.org/jira/browse/KAFKA-14119
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Reporter: Eric Wu
>Priority: Major
>
> There are potential issues of a `Sensor` not being protected from race 
> conditions when it 
> [records|https://github.com/apache/kafka/blob/6ac58ac6fcd53a512ea0bc0b3dc66f49870ff0cb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L230]
>  a value.
> It can be reproduced with a unit test, e.g., in `SensorTest`:
> {code:java}
> @Test
> public void testSensorRecordThreadSafety() {
> Time time = new MockTime(0, System.currentTimeMillis(), 0);
> Metrics metrics = new Metrics(time);
> Sensor sensor = metrics.sensor("sensor");
> MetricName metric = new MetricName("test", "test", "test", 
> Collections.emptyMap());
> sensor.add(metric, new Value());
> int totalRequests = 10;
> AtomicInteger count = new AtomicInteger();
> Executor threadPool = Executors.newFixedThreadPool(totalRequests);
> CompletableFuture[] futures = new CompletableFuture[totalRequests];
> for (int i = 0; i < totalRequests; i++) {
> futures[i] = CompletableFuture.runAsync(() -> {
> try {
> Thread.sleep(10); // to make it easier to repro
> } catch (InterruptedException e) {
> throw new RuntimeException(e);
> }
> sensor.record(count.addAndGet(1));
> }, threadPool);
> }
> CompletableFuture.allOf(futures).join();
> 
> assertEquals(1, sensor.metrics().size());
> double value = (double) sensor.metrics().get(0).metricValue();
> assertEquals(totalRequests, value);
> }{code}
> It needs some tweaks to make the fields visible in the test. Given a few 
> runs, the test should fail which demonstrates the thread safety issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-859: Add Metadata Log Processing Error Related Metrics

2022-07-27 Thread Niket Goel
Thanks for the review David.

Here are the answers to your questions. I will update the KIP to make the info 
clearer.

> 1) Does "publisher-error-count" represent the number of errors
> encountered only when loading the most recent image? Or will this value be
> the cumulative number of publisher errors since the broker started?
> 2) Same question for "listener-batch-load-error-count"
The intent is to have a cumulative number for both of these. The rationale is 
that any fault in loading an image (even if a subsequent load was OK) is worthy 
of inspection. It would be good to have a way to bring the count down to zero 
through an operator initiated signal, but that could be a follow up.

> 3) Will ForceRenounceCount be zero for non-leader controllers? Or will this
> value remain between elections and only get reset to zero upon a restart
I think it makes sense to keep these metrics for all controllers in the system. 
A forced resignation is usually looked at after it has happened, and at that 
point, the controller might not be the leader anymore.

> On Jul 27, 2022, at 11:39 AM, David Arthur 
>  wrote:
> 
> Thanks for the KIP, Niket! I definitely agree we need to surface metadata
> processing errors to the operator. I have some questions about the
> semantics of the new metrics:
> 
> 1) Does "publisher-error-count" represent the number of errors
> encountered only when loading the most recent image? Or will this value be
> the cumulative number of publisher errors since the broker started?
> 2) Same question for "listener-batch-load-error-count"
> 3) Will ForceRenounceCount be zero for non-leader controllers? Or will this
> value remain between elections and only get reset to zero upon a restart
> 
> Thanks!
> David
> 
> On Wed, Jul 27, 2022 at 2:20 PM Niket Goel 
> wrote:
> 
>> 
>> Hi all,
>> 
>> I would like to start a discussion on adding some new metrics to KRaft to
>> allow for better visibility into log processing errors.
>> 
>> KIP URL:
>> https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/KAFKA/KIP-859%253A%2BAdd%2BMetadata%2BLog%2BProcessing%2BError%2BRelated%2BMetrics=gmail-imap=165955196500=AOvVaw2Uzcu-JIs-OZSdfTavNjn7
>> 
>> Thanks!
>> Niket
>> 
>> 
> 
> -- 
> -David



[jira] [Created] (KAFKA-14119) Sensor in metrics has potential thread safety issues

2022-07-27 Thread Eric Wu (Jira)
Eric Wu created KAFKA-14119:
---

 Summary: Sensor in metrics has potential thread safety issues
 Key: KAFKA-14119
 URL: https://issues.apache.org/jira/browse/KAFKA-14119
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Reporter: Eric Wu


There are potential issues of a `Sensor` not being protected from race 
conditions when it 
[records|https://github.com/apache/kafka/blob/6ac58ac6fcd53a512ea0bc0b3dc66f49870ff0cb/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java#L230]
 a value.

It can be reproduced with a unit test, e.g., in `SensorTest`:
{code:java}
@Test
public void testSensorRecordThreadSafety() {
Time time = new MockTime(0, System.currentTimeMillis(), 0);
Metrics metrics = new Metrics(time);
Sensor sensor = metrics.sensor("sensor");
MetricName metric = new MetricName("test", "test", "test", 
Collections.emptyMap());
sensor.add(metric, new Value());

int totalRequests = 10;
AtomicInteger count = new AtomicInteger();
Executor threadPool = Executors.newFixedThreadPool(totalRequests);
CompletableFuture[] futures = new CompletableFuture[totalRequests];
for (int i = 0; i < totalRequests; i++) {
futures[i] = CompletableFuture.runAsync(() -> {
try {
Thread.sleep(10); // to make it easier to repro
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
sensor.record(count.addAndGet(1));
}, threadPool);
}
CompletableFuture.allOf(futures).join();

assertEquals(1, sensor.metrics().size());
double value = (double) sensor.metrics().get(0).metricValue();
assertEquals(totalRequests, value);
}{code}
It needs some tweaks to make the fields visible in the test. Given a few runs, 
the test should fail which demonstrates the thread safety issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-859: Add Metadata Log Processing Error Related Metrics

2022-07-27 Thread David Arthur
Thanks for the KIP, Niket! I definitely agree we need to surface metadata
processing errors to the operator. I have some questions about the
semantics of the new metrics:

1) Does "publisher-error-count" represent the number of errors
encountered only when loading the most recent image? Or will this value be
the cumulative number of publisher errors since the broker started?
2) Same question for "listener-batch-load-error-count"
3) Will ForceRenounceCount be zero for non-leader controllers? Or will this
value remain between elections and only get reset to zero upon a restart

Thanks!
David

On Wed, Jul 27, 2022 at 2:20 PM Niket Goel 
wrote:

>
> Hi all,
>
> I would like to start a discussion on adding some new metrics to KRaft to
> allow for better visibility into log processing errors.
>
> KIP URL:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-859%3A+Add+Metadata+Log+Processing+Error+Related+Metrics
>
> Thanks!
> Niket
>
>

-- 
-David


[DISCUSS] KIP-859: Add Metadata Log Processing Error Related Metrics

2022-07-27 Thread Niket Goel

Hi all,

I would like to start a discussion on adding some new metrics to KRaft to allow 
for better visibility into log processing errors.

KIP URL: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-859%3A+Add+Metadata+Log+Processing+Error+Related+Metrics
 

Thanks!
Niket



[jira] [Created] (KAFKA-14118) Generate and persist storage uuiid

2022-07-27 Thread Jira
José Armando García Sancio created KAFKA-14118:
--

 Summary: Generate and persist storage uuiid
 Key: KAFKA-14118
 URL: https://issues.apache.org/jira/browse/KAFKA-14118
 Project: Kafka
  Issue Type: Sub-task
Reporter: José Armando García Sancio


Both the StorageTool and RaftKafkaServer should generate a storage uuid and 
persist it in the meta.properties file.

The StorageTool should only create the meta.properties if it doesn't already 
exists.

RaftKafkaServer should only generate a storage uuid if the meta.properties file 
exists but it doesn't contain a storage uuid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14117) Flaky Test DynamicBrokerReconfigurationTest.testKeyStoreAlter

2022-07-27 Thread Hao Li (Jira)
Hao Li created KAFKA-14117:
--

 Summary: Flaky Test 
DynamicBrokerReconfigurationTest.testKeyStoreAlter
 Key: KAFKA-14117
 URL: https://issues.apache.org/jira/browse/KAFKA-14117
 Project: Kafka
  Issue Type: Test
  Components: unit tests
Reporter: Hao Li


This is a flaky test. Log:

 
{code:java}
[2022-07-27T11:44:23.102Z] DynamicBrokerReconfigurationTest > 
testKeyStoreAlter() FAILED [2022-07-27T11:44:23.102Z] 
org.opentest4j.AssertionFailedError: Duplicates not expected ==> expected: 
 but was:  [2022-07-27T11:44:23.102Z] at 
org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) 
[2022-07-27T11:44:23.102Z] at 
org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:40) 
[2022-07-27T11:44:23.103Z] at 
org.junit.jupiter.api.Assertions.assertFalse(Assertions.java:235) 
[2022-07-27T11:44:23.103Z] at 
kafka.server.DynamicBrokerReconfigurationTest.stopAndVerifyProduceConsume(DynamicBrokerReconfigurationTest.scala:1579)
 [2022-07-27T11:44:23.103Z] at 
kafka.server.DynamicBrokerReconfigurationTest.testKeyStoreAlter(DynamicBrokerReconfigurationTest.scala:399){code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14116) Update and add all of the Kafka Message Schemas

2022-07-27 Thread Jira
José Armando García Sancio created KAFKA-14116:
--

 Summary: Update and add all of the Kafka Message Schemas
 Key: KAFKA-14116
 URL: https://issues.apache.org/jira/browse/KAFKA-14116
 Project: Kafka
  Issue Type: Sub-task
Reporter: José Armando García Sancio


This includes:
 # LeaderChangeMessage
 # AddVoterRecord
 # RemoveVoterRecord
 # QuorumStateData
 # meta.properties
 # AddVoter
 # RemoveVoter
 # Fetch
 # FetchSnapshot
 # Vote
 # BeginQuorumEpoch
 # EndQuorumEpoch
 # DescribeQuorum



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-12566) Flaky Test MirrorConnectorsIntegrationSSLTest#testReplication

2022-07-27 Thread Hao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Li reopened KAFKA-12566:


Somehow this is still happening:

```

[2022-07-27T11:32:04.461Z] MirrorConnectorsIntegrationSSLTest > 
testReplication() FAILED [2022-07-27T11:32:04.461Z] 
org.opentest4j.AssertionFailedError: Condition not met within timeout 2. 
Offsets not translated downstream to primary cluster. ==> expected:  but 
was:  [2022-07-27T11:32:04.461Z] at 
org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) 
[2022-07-27T11:32:04.461Z] at 
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40) 
[2022-07-27T11:32:04.461Z] at 
org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) 
[2022-07-27T11:32:04.461Z] at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:334) 
[2022-07-27T11:32:04.462Z] at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:382) 
[2022-07-27T11:32:04.462Z] at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:331) 
[2022-07-27T11:32:04.462Z] at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:315) 
[2022-07-27T11:32:04.462Z] at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:305) 
[2022-07-27T11:32:04.462Z] at 
org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplication(MirrorConnectorsIntegrationBaseTest.java:319)
 [2022-07-27T11:32:04.462Z] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
[2022-07-27T11:32:04.462Z] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[2022-07-27T11:32:04.462Z] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [2022-07-27T11:32:04.462Z] at java.lang.reflect.Method.invoke(Method.java:498) 
[2022-07-27T11:32:04.462Z] at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
 [2022-07-27T11:32:04.462Z] at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
 [2022-07-27T11:32:04.462Z] at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
 [2022-07-27T11:32:04.463Z] at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
 [2022-07-27T11:32:04.464Z] at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
 [2022-07-27T11:32:04.464Z] at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 [2022-07-27T11:32:04.464Z] at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
 [2022-07-27T11:32:04.465Z] at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
 [2022-07-27T11:32:04.465Z] at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
 [2022-07-27T11:32:04.465Z] at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
 [2022-07-27T11:32:04.465Z] at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 [2022-07-27T11:32:04.465Z] at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
 [2022-07-27T11:32:04.465Z] at 

[jira] [Created] (KAFKA-14115) Password configs are logged in plaintext in KRaft

2022-07-27 Thread David Arthur (Jira)
David Arthur created KAFKA-14115:


 Summary: Password configs are logged in plaintext in KRaft
 Key: KAFKA-14115
 URL: https://issues.apache.org/jira/browse/KAFKA-14115
 Project: Kafka
  Issue Type: Bug
Reporter: David Arthur
 Fix For: 3.3.0, 3.4.0


While investigating KAFKA-14111, I also noticed that 
ConfigurationControlManager is logging sensitive configs in plaintext at INFO 
level.


{code}
[2022-07-27 12:14:09,927] INFO [Controller 1] ConfigResource(type=BROKER, 
name='1'): set configuration listener.name.external.ssl.key.password to bar 
(org.apache.kafka.controller.ConfigurationControlManager)
{code}

Once this new config reaches the broker, it is logged again, but this time it 
is redacted

{code}
[2022-07-27 12:14:09,957] INFO [BrokerMetadataPublisher id=1] Updating broker 1 
with new configuration : listener.name.external.ssl.key.password -> [hidden] 
(kafka.server.metadata.BrokerMetadataPublisher)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

2022-07-27 Thread Jason Gustafson
I don't think a major release is a requirement for a default change for
what it's worth. I do appreciate that there is a preference for not rocking
the boat though. For a little bit of background here, the problem we
have encountered in production since the idempotent producer became the
default is OOM errors due to huge numbers of producerIds that had to be
retained in the cache for 7 days. It is hard to prevent use cases from
emerging where producers are used and discarded rapidly. We will be using a
lower value for sure, but it would also be nice to reduce the likelihood
for the community to see this problem. The benefit of the caching
diminishes quickly over time since it is primarily meant to handle producer
retry windows. I do not think there is much difference between 1 days and 7
days from an application perspective, but it is a huge difference for the
broker's memory usage.

Best,
Jason

On Wed, Jul 27, 2022 at 2:57 AM Sagar  wrote:

> Thanks Justine for the KIP. I think it might be better to document the
> correlation between the new config and delivery.timeout.ms in the Public
> Interfaces Description.
>
> Also, I agree with Luke that for now setting a default to -1 should be
> good. We can look to switch to 1 day with major release.
>
> Thanks!
> Sagar.
>
> On Wed, Jul 27, 2022 at 9:05 AM Luke Chen  wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP.
> > I agree with you that we should try our best to keep backward
> > compatibility, although our intention is to have lower producer id
> > expiration timeout.
> > So, I think we should keep default to -1 IMO.
> > Maybe we change the default to 1 day in next major release (4.0)?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > 
> > wrote:
> >
> > > Thanks for taking a look Jason!
> > >
> > > I wondered if we wanted to have a smaller default but wasn't sure about
> > the
> > > compatibility story -- especially since there is the chance for
> producer
> > > IDs to expire silently.
> > > I do think that 1 day is fairly reasonable. If I don't hear any
> > conflicting
> > > opinions I can go ahead and update the default.
> > >
> > > Justine
> > >
> > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > 
> > > wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP. Although I hate seeing new configurations, I
> think
> > > this
> > > > is a good change. Combining these timeout behaviors into a single
> > > > configuration was definitely a mistake, but we didn't anticipate the
> > > > problem with the producer id cache. I do wonder if we can make the
> > > default
> > > > a bit lower to reduce the chances that users will hit the same memory
> > > > issues we have seen. After decoupling this configuration from
> > > > transactional.id.expiration.ms, the new timeout just needs to cover
> > the
> > > > longest duration that a producer might be retrying the same Produce
> > > > request. 7 days seems too high. Although I think it could go a fair
> > even
> > > > lower, perhaps 1 day is a reasonable place to start?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Hey Bill,
> > > > > Thanks! I was just going to say that hopefully
> > > > > transactional.id.expiration.ms would also be over the delivery
> > > timeout.
> > > > :)
> > > > > Thanks for the +1!
> > > > >
> > > > > Justine
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck 
> > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > I just took another look at the KIP, and I realize my
> > > > question/suggestion
> > > > > > about default values has already been addressed in the
> > > `Compatibility`
> > > > > > section.
> > > > > >
> > > > > > I'm +1 on the KIP.
> > > > > >
> > > > > > -Bill
> > > > > >
> > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck 
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > useful
> > > > > > > addition.
> > > > > > >
> > > > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > > > >
> > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> less
> > > than
> > > > > the
> > > > > > > delivery timeout could lead to duplicates, which makes sense.
> To
> > > > help
> > > > > > > avoid this situation, do we want to consider a default value
> that
> > > is
> > > > > the
> > > > > > > same as the delivery timeout?
> > > > > > >
> > > > > > > Thanks again for the KIP.
> > > > > > >
> > > > > > > Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > >  wrote:
> > > > > > >
> > > > > > >> Hey all!
> > > > > > >>
> > > > > > >> I'd like to start a discussion on my proposal to separate
> > > time-based
> > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > introducing a
> > > > > > >> new configuration.
> 

[jira] [Created] (KAFKA-14114) KIP-859: Add Metadata Log Processing Error Related Metrics

2022-07-27 Thread Niket Goel (Jira)
Niket Goel created KAFKA-14114:
--

 Summary: KIP-859: Add Metadata Log Processing Error Related Metrics
 Key: KAFKA-14114
 URL: https://issues.apache.org/jira/browse/KAFKA-14114
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Niket Goel
Assignee: Niket Goel
 Fix For: 3.3


Tracking Jira for KIP:859



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14113) KIP-856: KRaft Disk Failure Recovery

2022-07-27 Thread Jira
José Armando García Sancio created KAFKA-14113:
--

 Summary: KIP-856: KRaft Disk Failure Recovery
 Key: KAFKA-14113
 URL: https://issues.apache.org/jira/browse/KAFKA-14113
 Project: Kafka
  Issue Type: Improvement
Reporter: José Armando García Sancio






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.2.1 RC3

2022-07-27 Thread Bill Bejeck
Hi David,

Thanks for running the release!

I did the following steps:

   - Validated all signatures and checksums
   - Built from source
   - Ran all the unit tests
   - I spot-checked the doc.  I did notice the same version number as
   Randal - but I expect that will get fixed when the docs are updated with
   the release.

+1(binding)

Thanks,
Bill

On Tue, Jul 26, 2022 at 5:56 PM Matthew Benedict de Detrich
 wrote:

> Thanks for the RC,
>
> I ran the full (unit + integration) tests using Scala 2.12 and 2.13 across
> OpenJDK (Linux) 11 and 17 and all tests passed apart from a single one
> which is documented at https://issues.apache.org/jira/browse/KAFKA-13514
>
> +1 (non binding)
>
>
>
> On Fri, Jul 22, 2022 at 3:15 AM David Arthur 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first release candidate of Apache Kafka 3.2.1.
> >
> > This is a bugfix release with several fixes since the release of 3.2.0. A
> > few of the major issues include:
> >
> > * KAFKA-14062 OAuth client token refresh fails with SASL extensions
> > * KAFKA-14079 Memory leak in connectors using errors.tolerance=all
> > * KAFKA-14024 Cooperative rebalance regression causing clients to get
> stuck
> >
> >
> > Release notes for the 3.2.1 release:
> > https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/RELEASE_NOTES.html
> >
> >
> >
> >  Please download, test and vote by Wednesday July 27, 2022 at 17:00
> PT.
> > 
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/
> >
> > Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > Javadoc: https://home.apache.org/~davidarthur/kafka-3.2.1-rc3/javadoc/
> >
> > Tag to be voted upon (off 3.2 branch) is the 3.2.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.2.1-rc3
> >
> > Documentation: https://kafka.apache.org/32/documentation.html
> >
> > Protocol: https://kafka.apache.org/32/protocol.html
> >
> >
> > The past few builds have had flaky test failures. I will update this
> thread
> > with passing build links soon.
> >
> > Unit/Integration test job:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.2/
> > System test job:
> > https://jenkins.confluent.io/job/system-test-kafka/job/3.2/
> >
> >
> > Thanks!
> > David Arthur
> >
>
>
> --
>
> Matthew de Detrich
>
> *Aiven Deutschland GmbH*
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> *m:* +491603708037
>
> *w:* aiven.io *e:* matthew.dedetr...@aiven.io
>


Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-07-27 Thread José Armando García Sancio
Hi Igor,

Thanks for the KIP. Looking forward to this improvement. I'll review your KIP.

I should mention that I started a discussion thread on KIP-856: KRaft
Disk Failure Recovery at
https://lists.apache.org/thread/ytv0t18cplwwwqcp77h6vry7on378jzj

Both keep introducing similar concepts. For example both KIP introduce
a storage uuid that is stored in meta.properties. At first glance
there are some minor differences. I suggest that we review each
other's KIP so that we can remove these minor differences. What do you
think?

Thanks!
--
-José


[DISCUSS] KIP-856: KRaft Disk Failure Recovery

2022-07-27 Thread José Armando García Sancio
Hi all,

I would like to start the discussion on my design to allow KRaft to
detect and recover from disk failures in the minority of voters. For
those following the discussion on KIP-853, this is a subset of that
KIP with only the mechanisms required to solve the problem described
in the Motivation section.

KIP URL: https://cwiki.apache.org/confluence/x/5iX1D

Thanks!
-José


Re: [DISCUSS] KIP-853: KRaft Voters Change

2022-07-27 Thread José Armando García Sancio
Hi all,

Community members Jason Gustafson, Colin P. McCabe and I have been
having some offline conversations.

At a high-level KIP-853 solves the problems:
1) How can KRaft detect and recover from disk failures on the minority
of the voters?
2) How can KRaft support a changing set of voter nodes?

I think that problem 2) is a superset of problem 1). The mechanism for
solving problem 2) can be used to solve problem 1). This is the reason
that I decided to design them together and proposed this KIP. Problem
2) adds the additional requirement of how observers (Brokers and new
Controllers) discover the leader? KIP-853 solves this problem by
returning the endpoint of the leader in all of the KRaft RPCs. There
are some concerns with this approach.

To solve problem 1) we don't need to return the leader's endpoint
since it is expressed in the controller.quorum.voters property. To
make faster progress on 1) I have decided to create "KIP-856: KRaft
Disk Failure Recovery" that just addresses this problem. I will be
starting a discussion thread for KIP-856 soon.

We can continue the discussion of KIP-853 here. If KIP-856 gets
approved I will either:
3) Modify KIP-853 to just describe the improvement needed on top of KIP-856.
4) Create a new KIP and abandon KIP-853. This new KIP will take into
account all of the discussion from this thread.

Thanks!
-- 
-José


Re: [DISCUSS] Website changes required for Apache projects

2022-07-27 Thread Divij Vaidya
Hi all

To conclude this thread, all required changes listed for adhering to ASF
guidelines (documented at https://issues.apache.org/jira/browse/KAFKA-13868)
have been merged to the website. If you find any other aspects where we are
not adhering to ASF privacy policy
, please feel free to
create a new ticket.

Thanks everyone for chiming in on this discussion thread.

--
Divij Vaidya



On Fri, Jul 22, 2022 at 5:08 PM Mickael Maison 
wrote:

> Hi,
>
> Don't get me wrong, the videos are great and it's definitively the
> type of content we want on the website. We just got to be careful that
> all content is vendor neutral. I'm not advocating for introducing new
> policies or processes, I think the current PR process should be good
> enough.
>
> As noted, in this case the main issue comes from Youtube automatically
> adding the channel branding to the videos. Also on the quickstart and
> intro videos Tim says he's from Confluent. The intro he uses in the
> Streams videos [0] is in my opinion preferable. If it's possible to
> address this without some major editing, I think it would be worth
> doing.
>
> Thanks,
> Mickael
>
> 0: https://kafka.apache.org/32/documentation/streams/
>
> On Fri, Jul 22, 2022 at 4:22 PM Bill Bejeck  wrote:
> >
> > Hi Divij,
> >
> > After thinking about the embedded videos some more I think it's probably
> > best for now to go with option 1 you presented above (text links to the
> > videos).
> > I will do a follow on PR for option #2 - creating an image placeholder
> that
> > will trigger the video once clicked.
> >
> > Thanks again for driving this update effort.
> >
> > -Bill
> >
> > On Thu, Jul 21, 2022 at 5:25 PM Bill Bejeck  wrote:
> >
> > > Hi All,
> > >
> > > I've filed an issue with INFRA (
> > > https://issues.apache.org/jira/browse/INFRA-23499) to ask about
> uploading
> > > the videos to the ASF YouTube channel, which would resolve the branding
> > > issue.
> > >
> > > Thanks,
> > > Bill
> > >
> > > On Thu, Jul 21, 2022 at 1:43 PM Bill Bejeck  wrote:
> > >
> > >> Hi Divij,
> > >>
> > >> First of all, let me say thanks for taking up this task.
> > >>
> > >> We seem to have two options:
> > >>> 1. Replace videos on the website with links to the videos OR
> > >>> 2. Take a placeholder image and use JS to trigger playback after the
> user
> > >>> clicks.
> > >>>
> > >>> I would suggest going with option#1 right now due to time
> constraints and
> > >>> create a ticket to do (more user friendly) option#2 in the future.*
> What
> > >>> do
> > >>> you think?*
> > >>>
> > >>
> > >> I'm inclined to go with option #2.
> > >>
> > >> But taking a look at the https://apache.org/ site, there's an
> embedded
> > >> video directly on the page, not an image or a link.
> > >>
> > >> So I'm wondering, since the video doesn't start playing right away and
> > >> requires a user to click to start it, that the "click image to start"
> > >> requirement is satisfied,
> > >>
> > >> as it aligns with what we see now on the Apache® Software Foundation
> page.
> > >>
> > >>
> > >> Regarding the branding, that's not in the video file itself but comes
> > >> from YouTube and the video's channel.
> > >>
> > >> I propose that we host the video on the Apache YouTube
> > >>  channel, and
> > >> that would take care of the branding issue.
> > >>
> > >>
> > >> What do you think?
> > >>
> > >>
> > >> On Thu, Jul 21, 2022 at 4:19 AM Divij Vaidya  >
> > >> wrote:
> > >>
> > >>> Thanks for chiming in with your opinions John/Mickael.
> > >>>
> > >>> The current set of videos are very helpful and removing them might
> be a
> > >>> disservice to our users. The most ideal solution would be to host the
> > >>> videos on Apache servers without any branding. Another less than
> ideal
> > >>> solution would be to host a repository of links to educational
> content on
> > >>> our website.
> > >>>
> > >>> As for the next steps, I am going to do the following which would
> help us
> > >>> get answers on whether solution 1 or solution 2 is more feasible.
> Please
> > >>> let me know if you think we need to do something different here.
> > >>> 1. Reach out to ASF legal and ask what permissions/licence would we
> > >>> require
> > >>> from the video owners to host the videos ourselves.
> > >>> 2. Reach out to ASF community mailing list
> > >>> <
> > >>>
> https://www.apache.org/foundation/mailinglists.html#foundation-community
> > >>> >
> > >>> and ask how other communities are hosting educational content.
> > >>>
> > >>> There is still an open question about how we decide what content gets
> > >>> added
> > >>> and what doesn't. I would propose that the model should be the same
> as
> > >>> accepting code changes i.e. it goes through a community review
> requiring
> > >>> votes committers/PMC members.
> > >>>
> > >>> Regards,
> > >>> Divij Vaidya
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Jul 21, 2022 at 3:57 AM John Roesler 

[GitHub] [kafka-site] mimaison merged pull request #429: KAFKA-13868: Replace embedded YouTube links with hyperlinks on streams page

2022-07-27 Thread GitBox


mimaison merged PR #429:
URL: https://github.com/apache/kafka-site/pull/429


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] showuon commented on pull request #429: KAFKA-13868: Replace embedded YouTube links with hyperlinks on streams page

2022-07-27 Thread GitBox


showuon commented on PR #429:
URL: https://github.com/apache/kafka-site/pull/429#issuecomment-1196778133

   @mimaison , do you want to have another look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya commented on pull request #429: KAFKA-13868: Replace embedded YouTube links with hyperlinks on streams page

2022-07-27 Thread GitBox


divijvaidya commented on PR #429:
URL: https://github.com/apache/kafka-site/pull/429#issuecomment-1196687885

   @mimaison please review. Last one before we can close KAFKA-13868.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya opened a new pull request, #429: KAFKA-13868: Replace embedded YouTube links with hyperlinks on streams page

2022-07-27 Thread GitBox


divijvaidya opened a new pull request, #429:
URL: https://github.com/apache/kafka-site/pull/429

   This is a hot fix to merge the https://github.com/apache/kafka/pull/12438 
into kafka-site to satisfy the ASF privacy policy requirements. 
   
   ## Testing
   Tested locally.
   ![Screenshot 2022-07-27 at 14 46 
06](https://user-images.githubusercontent.com/71267/181250067-60fb0a07-b7e3-47e1-b61a-40b354ecddf7.png)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-13730) OAuth access token validation fails if it does not contain the "sub" claim

2022-07-27 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-13730.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> OAuth access token validation fails if it does not contain the "sub" claim
> --
>
> Key: KAFKA-13730
> URL: https://issues.apache.org/jira/browse/KAFKA-13730
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.1.0
>Reporter: Daniel Fonai
>Assignee: Kirk True
>Priority: Minor
> Fix For: 3.4.0
>
>
> Client authentication fails, when configured to use OAuth and the JWT access 
> token does {*}not contain the sub claim{*}. This issue was discovered while 
> testing Kafka integration with Ping Identity OAuth server. According to 
> Ping's 
> [documentation|https://apidocs.pingidentity.com/pingone/devguide/v1/api/#access-tokens-and-id-tokens]:
> {quote}sub – A string that specifies the identifier for the authenticated 
> user. This claim is not present for client_credentials tokens.
> {quote}
> In this case Kafka broker rejects the token regardless of the 
> [sasl.oauthbearer.sub.claim.name|https://kafka.apache.org/documentation/#brokerconfigs_sasl.oauthbearer.sub.claim.name]
>  property value.
>  
> 
>  
> Steps to reproduce:
> 1. Client configuration:
> {noformat}
> security.protocol=SASL_PLAINTEXT
> sasl.mechanism=OAUTHBEARER
> sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler
> sasl.oauthbearer.token.endpoint.url=https://oauth.server.fqdn/token/endpoint
> sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule
>  required\
>  clientId="kafka-client"\
>  clientSecret="kafka-client-secret";
> sasl.oauthbearer.sub.claim.name=client_id # claim name for the principal to 
> be extracted from, needed for client side validation too
> {noformat}
> 2. Broker configuration:
> {noformat}
> sasl.enabled.mechanisms=...,OAUTHBEARER
> listener.name.sasl_plaintext.oauthbearer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule
>  required;
> listener.name.sasl_plaintext.oauthbearer.sasl.server.callback.handler.class=org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler
> sasl.oauthbearer.jwks.endpoint.url=https://oauth.server.fqdn/jwks/endpoint
> sasl.oauthbearer.expected.audience=oauth-audience # based on OAuth server 
> setup
> sasl.oauthbearer.sub.claim.name=client_id # claim name for the principal to 
> be extracted from
> {noformat}
> 3. Try to perform some client operation:
> {noformat}
> kafka-topics --bootstrap-server `hostname`:9092 --list --command-config 
> oauth-client.properties
> {noformat}
> Result:
> Client authentication fails due to invalid access token.
>  - client log:
> {noformat}
> [2022-03-11 16:21:20,461] ERROR [AdminClient clientId=adminclient-1] 
> Connection to node -1 (localhost/127.0.0.1:9092) failed authentication due 
> to: {"status":"invalid_token"} (org.apache.kafka.clients.NetworkClient)
> [2022-03-11 16:21:20,463] WARN [AdminClient clientId=adminclient-1] Metadata 
> update failed due to authentication error 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager)
> org.apache.kafka.common.errors.SaslAuthenticationException: 
> {"status":"invalid_token"}
> Error while executing topic command : {"status":"invalid_token"}
> [2022-03-11 16:21:20,468] ERROR 
> org.apache.kafka.common.errors.SaslAuthenticationException: 
> {"status":"invalid_token"}
>  (kafka.admin.TopicCommand$)
> {noformat}
>  - broker log:
> {noformat}
> [2022-03-11 16:21:20,150] WARN Could not validate the access token: JWT 
> (claims->{"client_id":"...","iss":"...","iat":1647012079,"exp":1647015679,"aud":[...],"env":"...","org":"..."})
>  rejected due to invalid claims or other invalid content. Additional details: 
> [[14] No Subject (sub) claim is present.] 
> (org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler)
> org.apache.kafka.common.security.oauthbearer.secured.ValidateException: Could 
> not validate the access token: JWT 
> (claims->{"client_id":"...","iss":"...","iat":1647012079,"exp":1647015679,"aud":[...],"env":"...","org":"..."})
>  rejected due to invalid claims or other invalid content. Additional details: 
> [[14] No Subject (sub) claim is present.]
>   at 
> org.apache.kafka.common.security.oauthbearer.secured.ValidatorAccessTokenValidator.validate(ValidatorAccessTokenValidator.java:159)
>   at 
> org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler.handleValidatorCallback(OAuthBearerValidatorCallbackHandler.java:184)
>   at 
> 

Re: [DISCUSS] KIP-837 Allow MultiCasting a Result Record.

2022-07-27 Thread Sagar
Thanks Walker for the comments

I have updated the KIP with all the suggestions.

Thanks!

On Tue, Jul 12, 2022 at 10:59 PM Walker Carlson
 wrote:

> Hi Sagar,
>
> I just finished reading the KIP and this seems to be a great addition.
>
> I agree with Matthias that the interface with a default implementation and
> deprecating partition() does seem cleaner. It has been a pattern that we
> have followed in the past. How I would handle a custom streams partitioner
> is just always call partitions(). If it is implemented then we ignore the
> partition() and if not the default implementation should just wrap the
> deprecated method in a list.
>
> Despite that I think your concerns are valid about this causing some
> confusion. To avoid that in the past we have just made sure we updated the
> depreciation message very cleanly and also include that implementing the
> new method will override the old one in the description. All those docs
> plus good logging has worked well. We had a very similar situation when
> adding a new exception handler for streams back for 2.8 and these
> precautions seemed to be enough.
>
> thanks for the kip!
> Walker
>
> On Sun, Jul 10, 2022 at 1:22 PM Sagar  wrote:
>
> > Hi Matthias,
> >
> > I agree that working with interfaces is cleaner. As such, there's not
> much
> > value in keeping both the methods. So, we can go down the route of
> > deprecating partition(). The only question I have is till deprecation if
> we
> > get both partition() and partitions() implemented, we may need to give
> > precedence to partitions() method, right?
> >
> > Also, in IQ and FK-join the runtime check you proposed seems good to me
> and
> > your suggestion on broadcast makes sense as well.
> >
> > Lastly, I am leaning towards the interface approach now. Let's see if
> other
> > have any questions/comments.
> >
> > Thanks!
> > Sagar.
> >
> >
> > On Fri, Jul 8, 2022 at 4:31 AM Matthias J. Sax  wrote:
> >
> > > Thanks for explaining you reasoning.
> > >
> > > I agree that it might not be ideal to have both methods implemented,
> but
> > > if we deprecate the exiting one, it would only be an issue until we
> > > remove the old one? Or do we see value to keep both methods?
> > >
> > > In general, working with interfaces is cleaner than with abstract
> > > classed, that is why I proposed it.
> > >
> > > In the end, I don't have too strong of an opinion what the better
> option
> > > would be. Maybe others can chime in and share their thoughts?
> > >
> > > -Matthias
> > >
> > > On 7/1/22 10:54 PM, Sagar wrote:
> > > > Hi Matthias,
> > > >
> > > > Thanks for your review. The reason I chose to introduce a new
> abstract
> > > > class is that, while it doesn't entail any changes in the
> > > StreamPartitioner
> > > > interface, I also disabled the partition() method in that class.
> Reason
> > > to
> > > > do that is that I didn't want a scenario where a user implements both
> > > > partition and partitions methods which could lead to confusion. With
> > the
> > > > approach you suggested, while the interface still remains functional,
> > > users
> > > > get the option to implement either methods which is what I wanted to
> > > avoid.
> > > > Let me know if that makes sense.
> > > >
> > > > Regarding extending StreamsPartitioner, we could expose  net new to()
> > > > methods taking in this new class devoid of any StreamPartitioner. I
> > just
> > > > thought it's cleaner to keep it this way as StreamPartitioner already
> > > > dpes the partitioning. Let me know what you think.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Wed, Jun 29, 2022 at 5:34 AM Matthias J. Sax 
> > > wrote:
> > > >
> > > >> Thanks for the KIP. Overall a good addition.
> > > >>
> > > >> I am actually not sure if we need to add a new class? From my
> > > >> understanding, if there is exactly one abstract method, the
> interface
> > is
> > > >> still functional? Thus, we could add a new method to
> > > >> `StreamsPartitioner` with a default implementation (that calls the
> > > >> existing `partition()` method and wraps the result into a singleton
> > > list)?
> > > >>
> > > >> Not sure what the pros/cons for both approaches would be?
> > > >>
> > > >> If we really add a new class, I am wondering if it should inherit
> from
> > > >> `StreamsPartitioner` at all? Or course, if it does not, it's more
> > stuff
> > > >> we need to change, but the proposed overwrite of `partition()` that
> > > >> throws also does not seem to be super clean? -- I am even wondering
> if
> > > >> we should aim to deprecate the existing `partition()` and only offer
> > > >> `partitions()` in the future?
> > > >>
> > > >> For the broadcast option, I am wondering if returning `null` (not an
> > > >> singleton with `-1`) might be a clear option to encode it? Empty
> list
> > > >> would still be valid as "send to no partition".
> > > >>
> > > >> Btw: The `StreamPartitioner` interface is also used for IQ. For both
> > IQ
> > > >> and FK-join, it seem ok to just 

[DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-07-27 Thread Igor Soarez
Hi all,

I have proposal to handle JBOD disk failures in KRaft mode.

With KIP-833 KRaft is being marked production ready and ZK mode is being 
deprecated but support for JBOD is still a big feature that's missing. 

Please have a look and share your thoughts:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft

Thanks,

--
Igor


Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

2022-07-27 Thread Sagar
Thanks Justine for the KIP. I think it might be better to document the
correlation between the new config and delivery.timeout.ms in the Public
Interfaces Description.

Also, I agree with Luke that for now setting a default to -1 should be
good. We can look to switch to 1 day with major release.

Thanks!
Sagar.

On Wed, Jul 27, 2022 at 9:05 AM Luke Chen  wrote:

> Hi Justine,
>
> Thanks for the KIP.
> I agree with you that we should try our best to keep backward
> compatibility, although our intention is to have lower producer id
> expiration timeout.
> So, I think we should keep default to -1 IMO.
> Maybe we change the default to 1 day in next major release (4.0)?
>
> Thank you.
> Luke
>
> On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> 
> wrote:
>
> > Thanks for taking a look Jason!
> >
> > I wondered if we wanted to have a smaller default but wasn't sure about
> the
> > compatibility story -- especially since there is the chance for producer
> > IDs to expire silently.
> > I do think that 1 day is fairly reasonable. If I don't hear any
> conflicting
> > opinions I can go ahead and update the default.
> >
> > Justine
> >
> > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > 
> > wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP. Although I hate seeing new configurations, I think
> > this
> > > is a good change. Combining these timeout behaviors into a single
> > > configuration was definitely a mistake, but we didn't anticipate the
> > > problem with the producer id cache. I do wonder if we can make the
> > default
> > > a bit lower to reduce the chances that users will hit the same memory
> > > issues we have seen. After decoupling this configuration from
> > > transactional.id.expiration.ms, the new timeout just needs to cover
> the
> > > longest duration that a producer might be retrying the same Produce
> > > request. 7 days seems too high. Although I think it could go a fair
> even
> > > lower, perhaps 1 day is a reasonable place to start?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Hey Bill,
> > > > Thanks! I was just going to say that hopefully
> > > > transactional.id.expiration.ms would also be over the delivery
> > timeout.
> > > :)
> > > > Thanks for the +1!
> > > >
> > > > Justine
> > > >
> > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck 
> wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > I just took another look at the KIP, and I realize my
> > > question/suggestion
> > > > > about default values has already been addressed in the
> > `Compatibility`
> > > > > section.
> > > > >
> > > > > I'm +1 on the KIP.
> > > > >
> > > > > -Bill
> > > > >
> > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck 
> > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for the well written KIP, this looks like it will be a
> > useful
> > > > > > addition.
> > > > > >
> > > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > > >
> > > > > > You mentioned that setting the `producer.id.expiration.ms` less
> > than
> > > > the
> > > > > > delivery timeout could lead to duplicates, which makes sense.  To
> > > help
> > > > > > avoid this situation, do we want to consider a default value that
> > is
> > > > the
> > > > > > same as the delivery timeout?
> > > > > >
> > > > > > Thanks again for the KIP.
> > > > > >
> > > > > > Bill
> > > > > >
> > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > >  wrote:
> > > > > >
> > > > > >> Hey all!
> > > > > >>
> > > > > >> I'd like to start a discussion on my proposal to separate
> > time-based
> > > > > >> producer ID expiration from transactional ID expiration by
> > > > introducing a
> > > > > >> new configuration.
> > > > > >>
> > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > controlling
> > > > > >> memory usage in brokers -- especially now that by default
> > producers
> > > > are
> > > > > >> idempotent and create producer ID state.
> > > > > >>
> > > > > >> Please take a look and leave any comments you may have!
> > > > > >>
> > > > > >> KIP:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > >>
> > > > > >> Thanks!
> > > > > >> Justine
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-844: Transactional State Stores

2022-07-27 Thread Alexander Sorokoumov
Hey Nick,

Thank you for the kind words and the feedback! I'll definitely add an
option to configure the transactional mechanism in Stores factory method
via an argument as John previously suggested and might add the in-memory
option via RocksDB Indexed Batches if I figure why their creation via
rocksdb jni fails with `UnsatisfiedLinkException`.

Best,
Alex

On Wed, Jul 27, 2022 at 11:46 AM Alexander Sorokoumov <
asorokou...@confluent.io> wrote:

> Hey Guozhang,
>
> 1) About the param passed into the `recover()` function: it seems to me
>> that the semantics of "recover(offset)" is: recover this state to a
>> transaction boundary which is at least the passed-in offset. And the only
>> possibility that the returned offset is different than the passed-in
>> offset
>> is that if the previous failure happens after we've done all the commit
>> procedures except writing the new checkpoint, in which case the returned
>> offset would be larger than the passed-in offset. Otherwise it should
>> always be equal to the passed-in offset, is that right?
>
>
> Right now, the only case when `recover` returns an offset different from
> the passed one is when the failure happens *during* commit.
>
>
> If the failure happens after commit but before the checkpoint, `recover`
> might return either a passed or newer committed offset, depending on the
> implementation. The `recover` implementation in the prototype returns a
> passed offset because it deletes the commit marker that holds that offset
> after the commit is done. In that case, the store will replay the last
> commit from the changelog. I think it is fine as the changelog replay is
> idempotent.
>
> 2) It seems the only use for the "transactional()" function is to determine
>> if we can update the checkpoint file while in EOS.
>
>
> Right now, there are 2 other uses for `transactional()`:
> 1. To determine what to do during initialization if the checkpoint is gone
> (see [1]). If the state store is transactional, we don't have to wipe the
> existing data. Thinking about it now, we do not really need this check
> whether the store is `transactional` because if it is not, we'd not have
> written the checkpoint in the first place. I am going to remove that check.
> 2. To determine if the persistent kv store in KStreamImplJoin should be
> transactional (see [2], [3]).
>
> I am not sure if we can get rid of the checks in point 2. If so, I'd be
> happy to encapsulate `transactional()` logic in `commit/recover`.
>
> Best,
> Alex
>
> 1.
> https://github.com/apache/kafka/pull/12393/files#diff-971d9ef7ea8aef687fc7ee131bd166ced94445f4ab55aa83007541dccfdaL256-R281
> 2.
> https://github.com/apache/kafka/pull/12393/files#diff-9ce43046fdef1233ab762e728abd1d3d44d7c270b28dcf6b63aa31a93a30af07R266-R278
> 3.
> https://github.com/apache/kafka/pull/12393/files#diff-9ce43046fdef1233ab762e728abd1d3d44d7c270b28dcf6b63aa31a93a30af07R348-R354
>
> On Tue, Jul 26, 2022 at 6:39 PM Nick Telford 
> wrote:
>
>> Hi Alex,
>>
>> Excellent proposal, I'm very keen to see this land!
>>
>> Would it be useful to permit configuring the type of store used for
>> uncommitted offsets on a store-by-store basis? This way, users could
>> choose
>> whether to use, e.g. an in-memory store or RocksDB, potentially reducing
>> the overheads associated with RocksDb for smaller stores, but without the
>> memory pressure issues?
>>
>> I suspect that in most cases, the number of uncommitted records will be
>> very small, because the default commit interval is 100ms.
>>
>> Regards,
>>
>> Nick
>>
>> On Tue, 26 Jul 2022 at 01:36, Guozhang Wang  wrote:
>>
>> > Hello Alex,
>> >
>> > Thanks for the updated KIP, I looked over it and browsed the WIP and
>> just
>> > have a couple meta thoughts:
>> >
>> > 1) About the param passed into the `recover()` function: it seems to me
>> > that the semantics of "recover(offset)" is: recover this state to a
>> > transaction boundary which is at least the passed-in offset. And the
>> only
>> > possibility that the returned offset is different than the passed-in
>> offset
>> > is that if the previous failure happens after we've done all the commit
>> > procedures except writing the new checkpoint, in which case the returned
>> > offset would be larger than the passed-in offset. Otherwise it should
>> > always be equal to the passed-in offset, is that right?
>> >
>> > 2) It seems the only use for the "transactional()" function is to
>> determine
>> > if we can update the checkpoint file while in EOS. But the purpose of
>> the
>> > checkpoint file's offsets is just to tell "the local state's current
>> > snapshot's progress is at least the indicated offsets" anyways, and with
>> > this KIP maybe we would just do:
>> >
>> > a) when in ALOS, upon failover: we set the starting offset as
>> > checkpointed-offset, then restore() from changelog till the end-offset.
>> > This way we may restore some records twice.
>> > b) when in EOS, upon failover: we first call
>> recover(checkpointed-offset),

Re: [DISCUSS] KIP-844: Transactional State Stores

2022-07-27 Thread Alexander Sorokoumov
Hey Guozhang,

1) About the param passed into the `recover()` function: it seems to me
> that the semantics of "recover(offset)" is: recover this state to a
> transaction boundary which is at least the passed-in offset. And the only
> possibility that the returned offset is different than the passed-in offset
> is that if the previous failure happens after we've done all the commit
> procedures except writing the new checkpoint, in which case the returned
> offset would be larger than the passed-in offset. Otherwise it should
> always be equal to the passed-in offset, is that right?


Right now, the only case when `recover` returns an offset different from
the passed one is when the failure happens *during* commit.


If the failure happens after commit but before the checkpoint, `recover`
might return either a passed or newer committed offset, depending on the
implementation. The `recover` implementation in the prototype returns a
passed offset because it deletes the commit marker that holds that offset
after the commit is done. In that case, the store will replay the last
commit from the changelog. I think it is fine as the changelog replay is
idempotent.

2) It seems the only use for the "transactional()" function is to determine
> if we can update the checkpoint file while in EOS.


Right now, there are 2 other uses for `transactional()`:
1. To determine what to do during initialization if the checkpoint is gone
(see [1]). If the state store is transactional, we don't have to wipe the
existing data. Thinking about it now, we do not really need this check
whether the store is `transactional` because if it is not, we'd not have
written the checkpoint in the first place. I am going to remove that check.
2. To determine if the persistent kv store in KStreamImplJoin should be
transactional (see [2], [3]).

I am not sure if we can get rid of the checks in point 2. If so, I'd be
happy to encapsulate `transactional()` logic in `commit/recover`.

Best,
Alex

1.
https://github.com/apache/kafka/pull/12393/files#diff-971d9ef7ea8aef687fc7ee131bd166ced94445f4ab55aa83007541dccfdaL256-R281
2.
https://github.com/apache/kafka/pull/12393/files#diff-9ce43046fdef1233ab762e728abd1d3d44d7c270b28dcf6b63aa31a93a30af07R266-R278
3.
https://github.com/apache/kafka/pull/12393/files#diff-9ce43046fdef1233ab762e728abd1d3d44d7c270b28dcf6b63aa31a93a30af07R348-R354

On Tue, Jul 26, 2022 at 6:39 PM Nick Telford  wrote:

> Hi Alex,
>
> Excellent proposal, I'm very keen to see this land!
>
> Would it be useful to permit configuring the type of store used for
> uncommitted offsets on a store-by-store basis? This way, users could choose
> whether to use, e.g. an in-memory store or RocksDB, potentially reducing
> the overheads associated with RocksDb for smaller stores, but without the
> memory pressure issues?
>
> I suspect that in most cases, the number of uncommitted records will be
> very small, because the default commit interval is 100ms.
>
> Regards,
>
> Nick
>
> On Tue, 26 Jul 2022 at 01:36, Guozhang Wang  wrote:
>
> > Hello Alex,
> >
> > Thanks for the updated KIP, I looked over it and browsed the WIP and just
> > have a couple meta thoughts:
> >
> > 1) About the param passed into the `recover()` function: it seems to me
> > that the semantics of "recover(offset)" is: recover this state to a
> > transaction boundary which is at least the passed-in offset. And the only
> > possibility that the returned offset is different than the passed-in
> offset
> > is that if the previous failure happens after we've done all the commit
> > procedures except writing the new checkpoint, in which case the returned
> > offset would be larger than the passed-in offset. Otherwise it should
> > always be equal to the passed-in offset, is that right?
> >
> > 2) It seems the only use for the "transactional()" function is to
> determine
> > if we can update the checkpoint file while in EOS. But the purpose of the
> > checkpoint file's offsets is just to tell "the local state's current
> > snapshot's progress is at least the indicated offsets" anyways, and with
> > this KIP maybe we would just do:
> >
> > a) when in ALOS, upon failover: we set the starting offset as
> > checkpointed-offset, then restore() from changelog till the end-offset.
> > This way we may restore some records twice.
> > b) when in EOS, upon failover: we first call
> recover(checkpointed-offset),
> > then set the starting offset as the returned offset (which may be larger
> > than checkpointed-offset), then restore until the end-offset.
> >
> > So why not also:
> > c) we let the `commit()` function to also return an offset, which
> indicates
> > "checkpointable offsets".
> > d) for existing non-transactional stores, we just have a default
> > implementation of "commit()" which is simply a flush, and returns a
> > sentinel value like -1. Then later if we get checkpointable offsets -1,
> we
> > do not write the checkpoint. Upon clean shutting down we can just
> > checkpoint regardless 

Re: [DISCUSS] KIP-853: KRaft Voters Change

2022-07-27 Thread Jack Vanlightly
Hi Jose,

It's looking good!

> I think that when a replica has caught up is an implementation detail
> and we can have this detailed discussion in Jira or the PR. What do
> you think?

Yes, that sounds fine. For what it's worth, making the leader take the
decision of when an observer is caught-up or not greatly simplifies it
as the leader has all the information necessary.

Thanks
Jack


Re: [DISCUSS]: Including TLA+ in the repo

2022-07-27 Thread Jack Vanlightly
+1 for me too. Once the KIP-853 is agreed I will make any necessary changes
and submit a PR to the apache/kafka repo.

Jack

On Tue, Jul 26, 2022 at 10:10 PM Ismael Juma  wrote:

> I'm +1 for inclusion in the main repo and I was going to suggest the same
> in the KIP-853 discussion. The original authors of 3 and 4 are part of the
> kafka community, so we can ask them to submit PRs.
>
> Ismael
>
> On Tue, Jul 26, 2022 at 7:58 AM Tom Bentley  wrote:
>
> > Hi,
> >
> > I noticed that TLA+ has featured in the Test Plans of a couple of recent
> > KIPs [1,2]. This is a good thing in my opinion. I'm aware that TLA+ has
> > been used in the past to prove properties of various parts of the Kafka
> > protocol [3,4].
> >
> > The point I wanted to raise is that I think it would be beneficial to the
> > community if these models could be part of the main Kafka repo. That way
> > there are fewer hurdles to their discoverability and it makes it easier
> for
> > people to compare the implementation with the spec. Spreading familiarity
> > with TLA+ within the community is also a potential side-benefit.
> >
> > I notice that the specs in [4] are MIT-licensed, but according to the
> > Apache 3rd party license policy [5] it should be OK to include.
> >
> > Thoughts?
> >
> > Kind regards,
> >
> > Tom
> >
> > [1]:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-TestPlan
> > [2]:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes#KIP853:KRaftVoterChanges-TestPlan
> > [3]: https://github.com/hachikuji/kafka-specification
> > [4]:
> >
> >
> https://github.com/Vanlightly/raft-tlaplus/tree/main/specifications/pull-raft
> > [5]: https://www.apache.org/legal/resolved.html
> >
>


[jira] [Created] (KAFKA-14112) Expose replication-offset-lag Mirror metric

2022-07-27 Thread Elkhan Eminov (Jira)
Elkhan Eminov created KAFKA-14112:
-

 Summary: Expose replication-offset-lag Mirror metric
 Key: KAFKA-14112
 URL: https://issues.apache.org/jira/browse/KAFKA-14112
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Reporter: Elkhan Eminov
Assignee: Elkhan Eminov


The offset lag is the difference of the last replicated record's source offset 
and the end offset of the source.
The offset lag is a difference (LRO-LEO), but its constituents calculated at 
different point of time and place
 * LEO shall be calculated during source task's poll loop (ready to get it from 
the consumer)
 * LRO shall be kept in an in-memory "cache", that is updated during the task's 
producer callback

LRO is initialized when task is started, from the offset store. The difference 
shall be calculated when the freshest LEO acquired
in the poll loop. The calculated amount shall be defined as a MirrorMaker 
metric.

This would describe to amount of "to be replicated" number of records for a 
certain topic-partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)