[jira] [Created] (KAFKA-13474) Regression in dynamic update of client-side SSL factory

2021-11-23 Thread Igor Shipenkov (Jira)
Igor Shipenkov created KAFKA-13474:
--

 Summary: Regression in dynamic update of client-side SSL factory
 Key: KAFKA-13474
 URL: https://issues.apache.org/jira/browse/KAFKA-13474
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.7.2, 2.7.0
Reporter: Igor Shipenkov
 Attachments: failed-controller-single-session-2029.pcap.gz

h1. Problem
It seems, after updating listener SSL certificate with dynamic configuration 
update, old certificate is somehow still used for client SSL factory. Because 
of this broker fails to create new connection to controller after old 
certificate expires.

h1. History
Back in KAFKA-8336 there was an issue, when client-side SSL factory wasn't 
updating cetificate, when it was changed with dynamic configuration. That bug 
have been fixed in version 2.3 and I can confirm, that dynamic update worked 
for us with kafka 2.4. But now we have updated clusters to 2.7 and see this (or 
at least similar) problem again.

h1. Affected versions
First we've seen this on confluent 6.1.2, which (I think) based on kafka 2.7.0. 
Then I tried vanilla versions 2.7.0 and 2.7.2 and can reproduce problem on them 
just fine

h1. How to reproduce
* Have zookeeper somewhere (in my example it will be "10.88.0.21:2181").
* Get vanilla version 2.7.2 (or 2.7.0) from https://kafka.apache.org/downloads .
* Make basic broker config like this (don't forget to actually create log.dirs):
{code}
broker.id=1

listeners=SSL://:9092
advertised.listeners=SSL://localhost:9092

log.dirs=/tmp/broker1/data

zookeeper.connect=10.88.0.21:2181

security.inter.broker.protocol=SSL
ssl.protocol=TLSv1.2
ssl.client.auth=required
ssl.endpoint.identification.algorithm=
ssl.keystore.type=PKCS12
ssl.keystore.location=/tmp/broker1/secrets/broker1.keystore.p12
ssl.keystore.password=changeme1
ssl.key.password=changeme1
ssl.truststore.type=PKCS12
ssl.truststore.location=/tmp/broker1/secrets/truststore.p12
ssl.truststore.password=changeme
{code}
(I use here TLS 1.2 just so I can see client certificate in TLS handshake, you 
will get same error with default TLS 1.3 too)
** Repeat this config for another 2 brokers, changing id, listener port and 
certificate accordingly.
* Make basic client config (I use for it one of brokers' certificate):
{code}
security.protocol=SSL
ssl.key.password=changeme1
ssl.keystore.type=PKCS12
ssl.keystore.location=/tmp/broker1/secrets/broker1.keystore.p12
ssl.keystore.password=changeme1
ssl.truststore.type=PKCS12
ssl.truststore.location=/tmp/broker1/secrets/truststore.p12
ssl.truststore.password=changeme
ssl.endpoint.identification.algorithm=
{code}
* Create usual local self-signed PKI for test
** generate self-signed CA certificate and private key. Place certificate in 
truststore.
** create keys for broker certificates and create requests from them as usual 
(I'll use here same subject for all brokers)
** create 2 certificates as usual
{code}
openssl x509 \
   -req -CAcreateserial -days 1 \
   -CA ca/ca-cert.pem -CAkey ca/ca-key.pem \
   -in broker1.csr -out broker1.crt
{code}
** Use "faketime" utility to make third certificate expire soon:
{code}
# date here is some point yesterday, so certificate will expire like 10-15 
minutes from now
faketime "2021-11-23 10:15" openssl x509 \
   -req -CAcreateserial -days 1 \
   -CA ca/ca-cert.pem -CAkey ca/ca-key.pem \
   -in broker2.csr -out broker2.crt
{code}
** create keystores from certificates and place them according to broker 
configs from earlier
* Run 3 brokers with your configs like
{code}
./bin/kafka-server-start.sh server2.properties
{code}
(I start it here without daemon mode to see logs right on terminal - just use 
"tmux" or something for run 3 brokers simultaneously)
** you can check that one broker certificate will expire soon with
{code}
openssl s_client -connect localhost:9093  kafka.server.BrokerToControllerRequestThread)
{code}
and controller log will show something like
{code}
INFO [SocketServer brokerId=1] Failed authentication with /127.0.0.1 (SSL 
handshake failed) (org.apache.kafka.common.network.Selector)
{code}
and if broker with expired and changed certificate was controller itself it eve 
could not connect to itself.
* If you make traffic dump (and you use TLS 1.2 or less) then you will see that 
client tries to use old certificate.

Here is example of traffic dump, when broker with expired and dynamically 
changed certificate is current controller, so it can't connect to itself:
In this example you will see that "Server" use new certificate and "Client" use 
old certificate, but it's same broker!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13117) After processors, migrate TupleForwarder and CacheFlushListener

2021-11-23 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-13117.
--
Resolution: Fixed

[https://github.com/apache/kafka/pull/11481]

> After processors, migrate TupleForwarder and CacheFlushListener
> ---
>
> Key: KAFKA-13117
> URL: https://issues.apache.org/jira/browse/KAFKA-13117
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: John Roesler
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>
> Currently, both of these interfaces take plain values in combination with 
> timestamps:
> CacheFlushListener:
> {code:java}
> void apply(K key, V newValue, V oldValue, long timestamp)
> {code}
> TimestampedTupleForwarder
> {code:java}
>  void maybeForward(K key,
>V newValue,
>V oldValue,
>long timestamp){code}
> These are internally translated to the new PAPI, but after the processors are 
> migrated, there won't be a need to have this translation. We should update 
> both of these APIs to just accept {{Record>}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] KIP-798 Add possibility to write kafka headers in Kafka Console Producer

2021-11-23 Thread Luke Chen
Hi Florin,
I'm not a committer, but yes, this vote be concluded now.

Thank you.
Luke

On Wed, Nov 24, 2021 at 3:08 AM Florin Akermann 
wrote:

> Thanks all.
>
> The 72h window is through.
>
> @Comitters can this vote be concluded?
>
> The vote on KIP-798 would pass with:
> 4 binding +1
> 1 non-binding +1
> no vetoes
>
> Thanks,
> Florin
>
>
> On Tue, 23 Nov 2021 at 06:59, Luke Chen  wrote:
>
> > Hi Florin,
> > Thanks for the update!
> >
> > +1 (non-binding)
> >
> > Thank you.
> > Luke
> >
> > On Tue, Nov 23, 2021 at 2:00 AM Florin Akermann <
> florin.akerm...@gmail.com
> > >
> > wrote:
> >
> > > Hi Bill and David,
> > >
> > > Thank you both for the vote.
> > > @David: KIP is updated.
> > >
> > > Florin
> > >
> > > On Mon, 22 Nov 2021 at 18:28, David Jacot  >
> > > wrote:
> > >
> > > > Hi Florin,
> > > >
> > > > Thanks for the KIP. I am +1 (binding).
> > > >
> > > > There is a small typo in the Proposed Changes section:
> > > > `parse.header` should be `parse.headers`.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Mon, Nov 22, 2021 at 6:20 PM Bill Bejeck 
> wrote:
> > > > >
> > > > > Hi Florin,
> > > > >
> > > > > Thanks for the KIP, this seems like a very useful addition.
> > > > >
> > > > > +1(binding).
> > > > >
> > > > > -Bill
> > > > >
> > > > > On Mon, Nov 22, 2021 at 12:00 PM Florin Akermann <
> > > > florin.akerm...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Luke and Tom
> > > > > >
> > > > > > @Tom: Thanks for the vote.
> > > > > >
> > > > > > @Luke: Thanks for the feedback.
> > > > > >
> > > > > > I have updated the KIP accordingly with regards to your comments
> on
> > > the
> > > > > > remaining case (false,false) and the motivation.
> > > > > >
> > > > > > Regarding the "not only UTF-8": As far as I understand John it is
> > > fine
> > > > to
> > > > > > limit the scope for this change to UTF-8 only as it is a handy
> > > > addition on
> > > > > > its own. Other formats can be relatively easily supported by
> adding
> > > > more
> > > > > > properties in later KIPs. In my reply to John (email from 21 Nov
> > > 2021,
> > > > > > 11:29 UTC) I also added an explanation why I limited the scope to
> > > UTF-8
> > > > > > only.
> > > > > >
> > > > > > Thanks,
> > > > > > Florin
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, 22 Nov 2021 at 10:32, Tom Bentley 
> > > wrote:
> > > > > >
> > > > > > > Hi Florin,
> > > > > > >
> > > > > > > Thanks for the KIP!
> > > > > > >
> > > > > > > +1 (binding),
> > > > > > >
> > > > > > > Kind regards,
> > > > > > >
> > > > > > > Tom
> > > > > > >
> > > > > > > On Mon, Nov 22, 2021 at 6:51 AM Luke Chen 
> > > wrote:
> > > > > > >
> > > > > > > > Hi Florin,
> > > > > > > > Thanks for the KIP.
> > > > > > > >
> > > > > > > > This KIP makes sense to me. Just a comment that the
> motivation
> > > > section
> > > > > > is
> > > > > > > > not clearly explain why this KIP is important.
> > > > > > > > I think John already mentioned a good motivation, which is to
> > > > support
> > > > > > > "not
> > > > > > > > only UTF-8".
> > > > > > > > You should put that into the KIP, and of course if you have
> > other
> > > > > > > thoughts,
> > > > > > > > please also add them into KIP.
> > > > > > > >
> > > > > > > > Also, in the "public interface" section, there are 3 "Default
> > > > parsing
> > > > > > > > pattern", I think you should add 1 remaining case (false,
> > false)
> > > to
> > > > > > make
> > > > > > > it
> > > > > > > > complete.
> > > > > > > >
> > > > > > > > Otherwise, look good to me.
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Nov 21, 2021 at 7:37 PM Florin Akermann <
> > > > > > > florin.akerm...@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi John,
> > > > > > > > >
> > > > > > > > > Thanks for the vote and feedback.
> > > > > > > > >
> > > > > > > > > The thought occurred to me too.
> > > > > > > > >
> > > > > > > > > Do I understand it correctly: the current version of the
> > > > > > > > > kafka-console-producer cannot be used for anything other
> than
> > > > UTF-8
> > > > > > > keys
> > > > > > > > > and values?
> > > > > > > > > (There is no other implementation of MessageReader other
> than
> > > the
> > > > > > > > > ConsoleProducer$LineMessageReader)
> > > > > > > > > In other words, currently users seem to only apply it with
> > > utf-8
> > > > > > > strings
> > > > > > > > > for keys and values?
> > > > > > > > > This is why I figured I would not deviate from this
> > assumption
> > > > solely
> > > > > > > for
> > > > > > > > > the headers.
> > > > > > > > >
> > > > > > > > > I will happily raise another KIP / Jira if there is a need
> to
> > > > specify
> > > > > > > > other
> > > > > > > > > formats / serializers for headers, keys and/or values.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Florin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sat, 20 Nov 2021 at 19:34, John R

[jira] [Created] (KAFKA-13473) Log cleaner Dynamic configs aren't applied after a restart

2021-11-23 Thread Tim Patterson (Jira)
Tim Patterson created KAFKA-13473:
-

 Summary: Log cleaner Dynamic configs aren't applied after a restart
 Key: KAFKA-13473
 URL: https://issues.apache.org/jira/browse/KAFKA-13473
 Project: Kafka
  Issue Type: Bug
  Components: config, core
Affects Versions: 2.8.1
Reporter: Tim Patterson


Upon restarting kafka, dynamically configured log cleaner configs aren't picked 
up and applied.

 
Here are some logs from a local kafka when I up the threads to 2 using the 
kafka-config tool - Noting the last 2 lines where it starts up 2 log cleaner 
threads.

 
{code:java}
[2021-11-23 21:09:50,044] INFO [Admin Manager on Broker 1001]: Updating brokers 
with new configuration : log.cleaner.threads -> 2 (kafka.server.ZkAdminManager) 
[2021-11-23 21:09:50,092] INFO Processing override for entityPath: 
brokers/ with config: HashMap(log.cleaner.threads -> 2) 
(kafka.server.DynamicConfigManager) log.cleaner.threads = 2 [2021-11-23 
21:09:50,113] INFO Shutting down the log cleaner. (kafka.log.LogCleaner) 
[2021-11-23 21:09:50,114] INFO [kafka-log-cleaner-thread-0]: Shutting down 
(kafka.log.LogCleaner) [2021-11-23 21:09:50,116] INFO 
[kafka-log-cleaner-thread-0]: Stopped (kafka.log.LogCleaner) [2021-11-23 
21:09:50,116] INFO [kafka-log-cleaner-thread-0]: Shutdown completed 
(kafka.log.LogCleaner) [2021-11-23 21:09:50,119] INFO Starting the log cleaner 
(kafka.log.LogCleaner) [2021-11-23 21:09:50,178] INFO 
[kafka-log-cleaner-thread-0]: Starting (kafka.log.LogCleaner) [2021-11-23 
21:09:50,181] INFO [kafka-log-cleaner-thread-1]: Starting 
(kafka.log.LogCleaner){code}



And now after a restart, at no point does it ever start 2 threads, even though 
it clearly knows about the configs

 
{code:java}
[2021-11-23 21:10:46,659] INFO Starting the log cleaner (kafka.log.LogCleaner) 
[2021-11-23 21:10:46,723] INFO [kafka-log-cleaner-thread-0]: Starting 
(kafka.log.LogCleaner) [2021-11-23 21:10:48,124] INFO Processing override for 
entityPath: brokers/ with config: HashMap(log.cleaner.threads -> 2) 
(kafka.server.DynamicConfigManager) log.cleaner.backoff.ms = 15000 
log.cleaner.dedupe.buffer.size = 15000 log.cleaner.delete.retention.ms = 
8640 log.cleaner.enable = true log.cleaner.io.buffer.load.factor = 0.9 
log.cleaner.io.buffer.size = 524288 log.cleaner.io.max.bytes.per.second = 
1.7976931348623157E308 log.cleaner.max.compaction.lag.ms = 9223372036854775807 
log.cleaner.min.cleanable.ratio = 0.5 log.cleaner.min.compaction.lag.ms = 0 
log.cleaner.threads = 2{code}
 


When investigating from the kafka config tool all looks well.
 
{code:java}
kafka-configs --bootstrap-server $BROKER_URL --entity-type brokers 
--entity-default --describe --all | grep log.cleaner.threads 
log.cleaner.threads=2 sensitive=false 
synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}{code}
 

But if you try change the config you soon find out that all is not well(note 
here it mentions the current value is 1 in the validation message)
 
{code:java}
kafka-configs --bootstrap-server $BROKER_URL --entity-type brokers 
--entity-default --alter --add-config log.cleaner.threads=3 Error while 
executing config command with args '--bootstrap-server profile_kafka:9093 
--entity-type brokers --entity-default --alter --add-config 
log.cleaner.threads=3' java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidRequestException: Invalid config value 
for resource ConfigResource(type=BROKER, name=''): Invalid value 
org.apache.kafka.common.config.ConfigException: Log cleaner threads cannot be 
increased to more than double the current value 1 for configuration Invalid 
dynamic configuration{code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[DISCUSS] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

2021-11-23 Thread Colin McCabe
Hi all,

I have written a new KIP describing an Authorizer that stores data in the 
__cluster_metadata topic. It is designed to be used in KRaft mode. Please take 
a look here: https://cwiki.apache.org/confluence/x/h5KqCw

best,
Colin


Re: [VOTE] KIP-798 Add possibility to write kafka headers in Kafka Console Producer

2021-11-23 Thread Florin Akermann
Thanks all.

The 72h window is through.

@Comitters can this vote be concluded?

The vote on KIP-798 would pass with:
4 binding +1
1 non-binding +1
no vetoes

Thanks,
Florin


On Tue, 23 Nov 2021 at 06:59, Luke Chen  wrote:

> Hi Florin,
> Thanks for the update!
>
> +1 (non-binding)
>
> Thank you.
> Luke
>
> On Tue, Nov 23, 2021 at 2:00 AM Florin Akermann  >
> wrote:
>
> > Hi Bill and David,
> >
> > Thank you both for the vote.
> > @David: KIP is updated.
> >
> > Florin
> >
> > On Mon, 22 Nov 2021 at 18:28, David Jacot 
> > wrote:
> >
> > > Hi Florin,
> > >
> > > Thanks for the KIP. I am +1 (binding).
> > >
> > > There is a small typo in the Proposed Changes section:
> > > `parse.header` should be `parse.headers`.
> > >
> > > Best,
> > > David
> > >
> > > On Mon, Nov 22, 2021 at 6:20 PM Bill Bejeck  wrote:
> > > >
> > > > Hi Florin,
> > > >
> > > > Thanks for the KIP, this seems like a very useful addition.
> > > >
> > > > +1(binding).
> > > >
> > > > -Bill
> > > >
> > > > On Mon, Nov 22, 2021 at 12:00 PM Florin Akermann <
> > > florin.akerm...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Luke and Tom
> > > > >
> > > > > @Tom: Thanks for the vote.
> > > > >
> > > > > @Luke: Thanks for the feedback.
> > > > >
> > > > > I have updated the KIP accordingly with regards to your comments on
> > the
> > > > > remaining case (false,false) and the motivation.
> > > > >
> > > > > Regarding the "not only UTF-8": As far as I understand John it is
> > fine
> > > to
> > > > > limit the scope for this change to UTF-8 only as it is a handy
> > > addition on
> > > > > its own. Other formats can be relatively easily supported by adding
> > > more
> > > > > properties in later KIPs. In my reply to John (email from 21 Nov
> > 2021,
> > > > > 11:29 UTC) I also added an explanation why I limited the scope to
> > UTF-8
> > > > > only.
> > > > >
> > > > > Thanks,
> > > > > Florin
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 22 Nov 2021 at 10:32, Tom Bentley 
> > wrote:
> > > > >
> > > > > > Hi Florin,
> > > > > >
> > > > > > Thanks for the KIP!
> > > > > >
> > > > > > +1 (binding),
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > > On Mon, Nov 22, 2021 at 6:51 AM Luke Chen 
> > wrote:
> > > > > >
> > > > > > > Hi Florin,
> > > > > > > Thanks for the KIP.
> > > > > > >
> > > > > > > This KIP makes sense to me. Just a comment that the motivation
> > > section
> > > > > is
> > > > > > > not clearly explain why this KIP is important.
> > > > > > > I think John already mentioned a good motivation, which is to
> > > support
> > > > > > "not
> > > > > > > only UTF-8".
> > > > > > > You should put that into the KIP, and of course if you have
> other
> > > > > > thoughts,
> > > > > > > please also add them into KIP.
> > > > > > >
> > > > > > > Also, in the "public interface" section, there are 3 "Default
> > > parsing
> > > > > > > pattern", I think you should add 1 remaining case (false,
> false)
> > to
> > > > > make
> > > > > > it
> > > > > > > complete.
> > > > > > >
> > > > > > > Otherwise, look good to me.
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Nov 21, 2021 at 7:37 PM Florin Akermann <
> > > > > > florin.akerm...@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi John,
> > > > > > > >
> > > > > > > > Thanks for the vote and feedback.
> > > > > > > >
> > > > > > > > The thought occurred to me too.
> > > > > > > >
> > > > > > > > Do I understand it correctly: the current version of the
> > > > > > > > kafka-console-producer cannot be used for anything other than
> > > UTF-8
> > > > > > keys
> > > > > > > > and values?
> > > > > > > > (There is no other implementation of MessageReader other than
> > the
> > > > > > > > ConsoleProducer$LineMessageReader)
> > > > > > > > In other words, currently users seem to only apply it with
> > utf-8
> > > > > > strings
> > > > > > > > for keys and values?
> > > > > > > > This is why I figured I would not deviate from this
> assumption
> > > solely
> > > > > > for
> > > > > > > > the headers.
> > > > > > > >
> > > > > > > > I will happily raise another KIP / Jira if there is a need to
> > > specify
> > > > > > > other
> > > > > > > > formats / serializers for headers, keys and/or values.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Florin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sat, 20 Nov 2021 at 19:34, John Roesler <
> > vvcep...@apache.org>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Florin,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP!
> > > > > > > > >
> > > > > > > > > I think the assumption that header values are UTF-8 strings
> > > might
> > > > > not
> > > > > > > > hold
> > > > > > > > > up in the long run, but it seems like we can easily add a
> > > property
> > > > > > > later
> > > > > > > > to
> > > > > > > > > specify the format. It seems like this scope is probably a
> > > handy
> > > > > > > addition

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-23 Thread Artem Livshits
>  maybe I can firstly decrease the "batch.max.size" to 32KB

I think 32KB is too small.  With 5 in-flight and 100ms latency we can
produce 1.6MB/s per partition.  With 256KB we can produce 12.8MB/s per
partition.  We should probably set up some testing and see if 256KB has
problems.

To illustrate latency dynamics, let's consider a simplified model: 1
in-flight request per broker, produce latency 125ms, 256KB max request
size, 16 partitions assigned to the same broker, every second 128KB is
produced to each partition (total production rate is 2MB/sec).

If the batch size is 16KB, then the pattern would be the following:

0ms - produce 128KB into each partition
0ms - take 16KB from each partition send (total 256KB)
125ms - complete first 16KB from each partition, send next 16KB
250ms - complete second 16KB, send next 16KB
...
1000ms - complete 8th 16KB from each partition

from this model it's easy to see that there are 256KB that are sent
immediately, 256KB that are sent in 125ms, ... 256KB that are sent in 875ms.

If the batch size is 256KB, then the pattern would be the following:

0ms - produce 128KB into each partition
0ms - take 128KB each from first 2 partitions and send (total 256KB)
125ms - complete 2 first partitions, send data from next 2 partitions
...
1000ms - complete last 2 partitions

even though the pattern is different, there are still 256KB that are sent
immediately, 256KB that are sent in 125ms, ... 256KB that are sent in 875ms.

Now, in this example if we do strictly round-robin (current implementation)
and we have this exact pattern (not sure how often such regular pattern
would happen in practice -- I would expect that it would be a bit more
random), some partitions would experience higher latency than others (not
sure how much it would matter in practice -- in the end of the day some
bytes produced to a topic would have higher latency and some bytes would
have lower latency).  This pattern is easily fixed by choosing the next
partition randomly instead of using round-robin.

-Artem

On Tue, Nov 23, 2021 at 12:08 AM Luke Chen  wrote:

> Hi Tom,
> Thanks for your comments. And thanks for Artem's explanation.
> Below is my response:
>
> > Currently because buffers are allocated using batch.size it means we can
> handle records that are that large (e.g. one big record per batch). Doesn't
> the introduction of smaller buffer sizes (batch.initial.size) mean a
> corresponding decrease in the maximum record size that the producer can
> handle?
>
> Actually, the "batch.size" is only like a threshold to decide if the batch
> is "ready to be sent". That is, even if you set the "batch.size=16KB"
> (default value), users can still send one record sized with 20KB, as long
> as the size is less than "max.request.size" in producer (default 1MB).
> Therefore, the introduction of "batch.initial.size" won't decrease the
> maximum record size that the producer can handle.
>
> > But isn't there the risk that drainBatchesForOneNode would end up not
> sending ready
> batches well past when they ought to be sent (according to their linger.ms
> ),
> because it's sending buffers for earlier partitions too aggressively?
>
> Did you mean that we have a "max.request.size" per request (default is
> 1MB), and before this KIP, the request can include 64 batches in single
> request ["batch.size"(16KB) * 64 = 1MB], but now, we might be able to
> include 32 batches or less, because we aggressively sent more records in
> one batch, is that what you meant? That's a really good point that I've
> never thought about. I think your suggestion to go through other partitions
> that just fit "batch.size", or expire "linger.ms" first, before handling
> the one that is > "batch.size" limit is not a good way, because it might
> cause the one with size > "batch.size" always in the lowest priority, and
> cause starving issue that the batch won't have chance to get sent.
>
> I don't have better solution for it, but maybe I can firstly decrease the
> "batch.max.size" to 32KB, instead of aggressively 256KB in the KIP. That
> should alleviate the problem. And still improve the throughput. What do you
> think?
>
> Thank you.
> Luke
>
> On Tue, Nov 23, 2021 at 9:04 AM Artem Livshits
>  wrote:
>
> > > I think this KIP would change the behaviour of producers when there are
> > multiple partitions ready to be sent
> >
> > This is correct, the pattern changes and becomes more coarse-grained.
> But
> > I don't think it changes fairness over the long run.  I think it's a good
> > idea to change drainIndex to be random rather than round robin to avoid
> > forming patterns where some partitions would consistently get higher
> > latencies than others because they wait longer for their turn.
> >
> > If we really wanted to preserve the exact patterns, we could either try
> to
> > support multiple 16KB batches from one partition per request (probably
> > would require protocol change to change logic on the broker for duplicate
> > detection) or try

Re: [DISCUSS] KIP-714: Client metrics and observability

2021-11-23 Thread Bob Barrett
Hi Magnus,

Thanks for the thorough KIP, this seems very useful.

Would it make sense to include the request type as a label for the
`client.request.success`, `client.request.errors` and `client.request.rtt`
metrics? I think it would be very useful to see which specific requests are
succeeding and failing for a client. One specific case I can think of where
this could be useful is producer batch timeouts. If a Java application does
not enable producer client logs (unfortunately, in my experience this
happens more often than it should), the application logs will only contain
the expiration error message, but no information about what is causing the
timeout. The requests might all be succeeding but taking too long to
process batches, or metadata requests might be failing, or some or all
produce requests might be failing (if the bootstrap servers are reachable
from the client but one or more other brokers are not, for example). If the
cluster operator is able to identify the specific requests that are slow or
failing for a client, they will be better able to diagnose the issue
causing batch timeouts.

One drawback I can think of is that this will increase the cardinality of
the request metrics. But any given client is only going to use a small
subset of the request types, and since we already have partition labels for
the topic-level metrics, I think request labels will still make up a
relatively small percentage of the set of metrics.

Thanks,
Bob

On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass 
wrote:

> Hi Magnus,
>
> I think this is a very useful addition. We also have a similar (but much
> more simplistic) implementation of this. Maybe I missed it in the KIP but
> what about adding metrics about the subscription cache itself? That I think
> would improve its usability and debuggability as we'd be able to see its
> performance, hit/miss rates, eviction counts and others.
>
> Best,
> Viktor
>
> On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill 
> wrote:
>
> > Hi Mickael,
> >
> > see inline.
> >
> > Den ons 10 nov. 2021 kl 15:21 skrev Mickael Maison <
> > mickael.mai...@gmail.com
> > >:
> >
> > > Hi Magnus,
> > >
> > > I see you've addressed some of the points I raised above but some (4,
> > > 5) have not been addressed yet.
> > >
> >
> > Re 4) How will the user/app know metrics are being sent.
> >
> > One possibility is to add a JMX metric (thus for user consumption) for
> the
> > number of metric pushes the
> > client has performed, or perhaps the number of metrics subscriptions
> > currently being collected.
> > Would that be sufficient?
> >
> > Re 5) Metric sizes and rates
> >
> > A worst case scenario for a producer that is producing to 50 unique
> topics
> > and emitting all standard metrics yields
> > a serialized size of around 100KB prior to compression, which compresses
> > down to about 20-30% of that depending
> > on compression type and topic name uniqueness.
> > The numbers for a consumer would be similar.
> >
> > In practice the number of unique topics would be far less, and the
> > subscription set would typically be for a subset of metrics.
> > So we're probably closer to 1kb, or less, compressed size per client per
> > push interval.
> >
> > As both the subscription set and push intervals are controlled by the
> > cluster operator it shouldn't be too hard
> > to strike a good balance between metrics overhead and granularity.
> >
> >
> >
> > >
> > > I'm really uneasy with this being enabled by default on the client
> > > side. When collecting data, I think the best practice is to ensure
> > > users are explicitly enabling it.
> > >
> >
> > Requiring metrics to be explicitly enabled on clients severely cripples
> its
> > usability and value.
> >
> > One of the problems that this KIP aims to solve is for useful metrics to
> be
> > available on demand
> > regardless of the technical expertise of the user. As Ryanne points, out
> a
> > savvy user/organization
> > will typically have metrics collection and monitoring in place already,
> and
> > the benefits of this KIP
> > are then more of a common set and format metrics across client
> > implementations and languages.
> > But that is not the typical Kafka user in my experience, they're not
> Kafka
> > experts and they don't have the
> > knowledge of how to best instrument their clients.
> > Having metrics enabled by default for this user base allows the Kafka
> > operators to proactively and reactively
> > monitor and troubleshoot client issues, without the need for the less
> savvy
> > user to do anything.
> > It is often too late to tell a user to enable metrics when the problem
> has
> > already occurred.
> >
> > Now, to be clear, even though metrics are enabled by default on clients
> it
> > is not enabled by default
> > on the brokers; the Kafka operator needs to build and set up a metrics
> > plugin and add metrics subscriptions
> > before anything is sent from the client.
> > It is opt-out on the clients and opt-in on the broker.
> >

Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-11-23 Thread Mickael Maison
Hi David,

Can we also consider https://issues.apache.org/jira/browse/KAFKA-13397?
It's essentially a regression but in a very specific case. To hit it,
you must be running MirrorMaker in dedicated mode and have changed the
separator of the default replication policy.

Thanks,
Mickael

On Tue, Nov 23, 2021 at 4:58 PM David Jacot  wrote:
>
> Hi Ron,
>
> Thank you for reaching out about this. While this is clearly not a
> regression, I agree with including it in 3.1 in order to have proper
> and correct configuration constraints for KRaft. You can proceed.
>
> Cheers,
> David
>
> On Tue, Nov 23, 2021 at 2:55 PM Ron Dagostino  wrote:
> >
> > Hi David.  I would like to nominate
> > https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13456
> > "Tighten KRaft config checks/constraints" as a 3.1.0 blocker.  The
> > existing configuration constraints/checks related to KRaft currently
> > do not eliminate certain illegal configuration combinations. The
> > illegal combinations do not cause harm at the moment, but we would
> > like to implement constraints in 3.1.0 to catch them while KRaft is
> > still in Preview.  We could add these additional checks later in 3.2.x
> > instead, but we would like to add these as early as possible: we
> > expect more people to begin trying KRaft with each subsequent release,
> > and it would be best to eliminate as quickly as we can the possibility
> > of people using configurations that would need fixing later.
> >
> > A patch is available at https://github.com/apache/kafka/pull/11503/.
> >
> > Ron
> >
> >
> > On Tue, Nov 23, 2021 at 3:19 AM David Jacot  
> > wrote:
> > >
> > > Hi Chris,
> > >
> > > Thanks for reporting both issues. As both are regressions, I do agree that
> > > they are blockers and that we would fix them for 3.1.
> > >
> > > Cheers,
> > > David
> > >
> > > On Mon, Nov 22, 2021 at 10:50 PM Chris Egerton
> > >  wrote:
> > > >
> > > > Hi David,
> > > >
> > > > I have another blocker to propose. KAFKA-13472 (
> > > > https://issues.apache.org/jira/browse/KAFKA-13472) is another 
> > > > regression in
> > > > Connect caused by recently-merged changes for KAFKA-12487 (
> > > > https://issues.apache.org/jira/browse/KAFKA-12487) which can lead to 
> > > > data
> > > > loss in sink connectors in some rare edge cases. I've opened a fix PR (
> > > > https://github.com/apache/kafka/pull/11526) already, and have also 
> > > > opened a
> > > > fix PR (https://github.com/apache/kafka/pull/11524) for the 
> > > > aforementioned
> > > > KAFKA-13469.
> > > >
> > > > Please let me know if we can merge a fix for this in time for the 3.1.0
> > > > release; if not, as with KAFKA-13469, we may want to revert the changes 
> > > > for
> > > > the PR that cause this issue (in this case, that'd be the PR for
> > > > KAFKA-12487).
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Mon, Nov 22, 2021 at 11:42 AM Chris Egerton  
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > I'd like to propose KAFKA-13469 (
> > > > > https://issues.apache.org/jira/browse/KAFKA-13469) as a blocker. It 
> > > > > is a
> > > > > regression in Connect caused by recently-merged changes for 
> > > > > KAFKA-12226 (
> > > > > https://issues.apache.org/jira/browse/KAFKA-12226) which leads to
> > > > > duplicate records for source tasks. I plan to have a fix PR opened by 
> > > > > the
> > > > > end of the day.
> > > > >
> > > > > Please let me know if we can merge a fix for this in time for the 
> > > > > 3.1.0
> > > > > release; if not, we may want to revert the changes for KAFKA-12226.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Mon, Nov 15, 2021 at 5:02 AM David Jacot 
> > > > > 
> > > > > wrote:
> > > > >
> > > > >> Hi folks,
> > > > >>
> > > > >> We reached the code freeze for the Apache Kafka 3.1 release on 
> > > > >> Friday.
> > > > >> Therefore,
> > > > >> we will only accept blockers from now on.
> > > > >>
> > > > >> There already are a couple of blockers identified which were not
> > > > >> completed before
> > > > >> the code freeze. Please, raise any new blockers to this thread.
> > > > >>
> > > > >> For all the non-blocker issues targeting 3.1.0, I will move them to
> > > > >> the next release.
> > > > >>
> > > > >> Cheers,
> > > > >> David
> > > > >>
> > > > >> On Fri, Oct 29, 2021 at 12:20 PM Dongjin Lee  
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi David,
> > > > >> >
> > > > >> > Please update the components of the following KIPs:
> > > > >> >
> > > > >> > - KIP-390: Support Compression Level - Core, Clients
> > > > >> > - KIP-653: Upgrade log4j to log4j2 - Clients, Connect, Core, 
> > > > >> > Streams
> > > > >> (that
> > > > >> > is, Log4j-appender, Tools, and Trogdor are excluded.)
> > > > >> >
> > > > >> > Best,
> > > > >> > Dongjin
> > > > >> >
> > > > >> > On Fri, Oct 29, 2021 at 2:24 AM Chris Egerton
> > > > >> 
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi David,
> > > > >> > >
> > > > >> > > I've moved KIP-618 to

Re: KIP-769: Connect API to retrieve connector configuration definitions

2021-11-23 Thread Chris Egerton
Hi Mickael,

I think the increase in scope here is great and the added value certainly
justifies the proposed changes. I have some thoughts but overall I like the
direction this is going in now.

1. The new /plugins endpoint is described as containing "all plugins that
are Connectors, Transformations, Converters, HeaderConverters and
Predicates". So essentially, it looks like we want to expose all plugins
that are configured on a per-connector basis, but exclude plugins that are
configured on a per-worker basis (such as config providers and REST
extensions). Do you think it may be valuable to expose information on
worker-level plugins as well?

2. The description for the new /plugins endpoint also states that "Plugins
will be grouped by plugin.path. This will make it clear to users what's
available to use as it's not possible to use a Connector from one path with
Transformations from another.". Is this true? I thought that Connect's
classloading made it possible to package
converters/transformations/predicates completely independently from each
other, and to reference them from also-independently-packaged connectors.
If it turns out that this is the case, could we consider restructuring the
response to be grouped by plugin type instead of by classloader? There's
also the ungrouped format proposed in KIP-494 (
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120740150)
which we might consider as well.

3. I think this can be left for a follow-up KIP if necessary, but I'm
curious about your thoughts on adding new validate methods to all
connector-level plugins that can be used similarly to how the existing
Connector::validate method (
https://github.com/apache/kafka/blob/1e0916580f16b99b911b0ed36e9740dcaeef520e/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java#L131-L146)
is used. This would allow for plugins to perform validation that's more
sophisticated than what the ConfigDef is capable of, such as validating
combinations of properties like a hostname and credentials for reaching it.
I know that at least Confluent's Avro, protobuf, and JSON schema converters
would benefit from this kind of feature. It's a little tangential to this
KIP (which at the moment is about discovering plugins and their
configuration surfaces, as opposed to validating them), but I figured I'd
ask since we're going to be expanding the Converter interface and it may be
useful to tackle this while we're in the neighborhood.

4. The description for the new /plugins///configdef endpoint
states that "Name must be the fully qualified class name of the plugin".
Any reason not to also support aliases (e.g., "FileStreamSinkConnector" or
"FileStreamSink" instead of
"org.apache.kafka.connect.file.FileStreamSinkConnector")?

Cheers,

Chris

On Tue, Nov 23, 2021 at 12:07 PM Mickael Maison 
wrote:

> Thanks all for the feedback!
>
> Chris,
> I agree that fixing the current endpoint helps a lot. Thanks for
> raising these JIRAs and submitting a PR!
> However thinking about the issue further, I decided to expand the
> scope of the KIP to cover all user-visible plugins.
> In practice, users want to know about all available plugins not only
> connectors. This includes transformations, converters,
> header_converters and predicates. As we also want to retrieve
> configdef for these too, I think it makes sense to introduce a new
> endpoint to do so. Alongside we obviously need a new endpoint for
> listing all plugins.
>
> Gunnar,
> I took a look at exposing valid values via the API. I think the issue
> is that Validators don't expose a way to retrieve valid values.
> Changing validators will have an impact on all components so I'd
> prefer to address this requirement in a separate KIP. I agree this
> would be an interesting improvement and I'd happy to write a KIP for
> it too.
>
>
> I have updated the KIP accordingly. Let me know if you have further
> feedback.
>
> Thanks,
> Mickael
>
> On Tue, Nov 16, 2021 at 9:31 PM Gunnar Morling
>  wrote:
> >
> > Hi,
> >
> > I'm +1 for adding a GET endpoint for obtaining config definitions. It
> > always felt odd to me that one has to issue a PUT for that purpose. If
> > nothing else, it'd be better in terms of discoverability of the KC REST
> API.
> >
> > One additional feature request I'd have is to expose the valid enum
> > constants for enum-typed options. That'll help to display the values in a
> > drop-down or via radio buttons in a UI, give us tab completion in kcctl,
> > etc.
> >
> > Best,
> >
> > --Gunnar
> >
> >
> > Am Di., 16. Nov. 2021 um 16:31 Uhr schrieb Chris Egerton
> > :
> >
> > > Hi Viktor,
> > >
> > > It sounds like there are three major points here in favor of a new GET
> > > endpoint for connector config defs.
> > >
> > > 1. You cannot issue a blank ("dummy") request for sink connectors
> because a
> > > topic list/topic regex has to be supplied (otherwise the PUT endpoint
> > > returns a 500 response)
> > > 2. A dummy request still triggers custom val

Re: KIP-769: Connect API to retrieve connector configuration definitions

2021-11-23 Thread Mickael Maison
Thanks all for the feedback!

Chris,
I agree that fixing the current endpoint helps a lot. Thanks for
raising these JIRAs and submitting a PR!
However thinking about the issue further, I decided to expand the
scope of the KIP to cover all user-visible plugins.
In practice, users want to know about all available plugins not only
connectors. This includes transformations, converters,
header_converters and predicates. As we also want to retrieve
configdef for these too, I think it makes sense to introduce a new
endpoint to do so. Alongside we obviously need a new endpoint for
listing all plugins.

Gunnar,
I took a look at exposing valid values via the API. I think the issue
is that Validators don't expose a way to retrieve valid values.
Changing validators will have an impact on all components so I'd
prefer to address this requirement in a separate KIP. I agree this
would be an interesting improvement and I'd happy to write a KIP for
it too.


I have updated the KIP accordingly. Let me know if you have further feedback.

Thanks,
Mickael

On Tue, Nov 16, 2021 at 9:31 PM Gunnar Morling
 wrote:
>
> Hi,
>
> I'm +1 for adding a GET endpoint for obtaining config definitions. It
> always felt odd to me that one has to issue a PUT for that purpose. If
> nothing else, it'd be better in terms of discoverability of the KC REST API.
>
> One additional feature request I'd have is to expose the valid enum
> constants for enum-typed options. That'll help to display the values in a
> drop-down or via radio buttons in a UI, give us tab completion in kcctl,
> etc.
>
> Best,
>
> --Gunnar
>
>
> Am Di., 16. Nov. 2021 um 16:31 Uhr schrieb Chris Egerton
> :
>
> > Hi Viktor,
> >
> > It sounds like there are three major points here in favor of a new GET
> > endpoint for connector config defs.
> >
> > 1. You cannot issue a blank ("dummy") request for sink connectors because a
> > topic list/topic regex has to be supplied (otherwise the PUT endpoint
> > returns a 500 response)
> > 2. A dummy request still triggers custom validations by the connector,
> > which may be best to avoid if we know for sure that the config isn't worth
> > validating yet
> > 3. It's more ergonomic and intuitive to be able to issue a GET request
> > without having to give a dummy connector config
> >
> > With regards to 1, this is actually a bug in Connect (
> > https://issues.apache.org/jira/browse/KAFKA-13327) with a fix already
> > implemented and awaiting committer review (
> > https://github.com/apache/kafka/pull/11369). I think it'd be better to
> > focus on fixing this bug in general instead of implementing a new REST
> > endpoint in order to allow people to work around it.
> >
> > With regards to 2, this is technically possible but I'm unsure it'd be too
> > common out in the wild given that most validations that could be expensive
> > would involve things like connecting to a database, checking if a cloud
> > storage bucket exists, etc., none of which are possible without some
> > configuration properties from the user (db hostname, bucket name, etc.).
> >
> > With regards to 3, I do agree that it'd be easier for people designing UIs
> > to have a GET API to work against. I'm just not sure it's worth the
> > additional implementation, testing, and maintenance burden. If it were
> > possible to issue a PUT request without unexpected 500s for invalid
> > configs, would that suffice? AFAICT it'd basically be as simple as issuing
> > a PUT request with a dummy body consisting of nothing except the connector
> > class (which at this point we might even make unnecessary and just
> > automatically replace with the connector class from the URL) and then
> > filtering the response to just grab the "definition" field of each element
> > in the "configs" array in the response.
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Nov 16, 2021 at 9:52 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > wrote:
> >
> > > Hi Folks,
> > >
> > > I too think this would be a very useful feature. Some of our management
> > > applications would provide a wizard for creating connectors. In this
> > > scenario the user basically would fill out a sample configuration
> > generated
> > > by the UI which would send it back to Connect for validation and
> > eventually
> > > create a new connector. The first part of this workflow can be enhanced
> > if
> > > we had an API that can return the configuration definition of the given
> > > type of connector as the UI application would be able to generate a
> > sample
> > > for the user based on that (nicely drawn diagram:
> > > https://imgur.com/a/7S1Xwm5).
> > > The connector-plugins/{connectorType}/config/validate API essentially
> > works
> > > and returns the data that we need, however it is a HTTP PUT API that is a
> > > bit unintuitive for a fetch-like functionality and also functionally
> > > different as it validates the given (dummy) request. In case of sink
> > > connectors one would need to also provide a topic name.
> > >
> 

Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-11-23 Thread David Jacot
Hi Ron,

Thank you for reaching out about this. While this is clearly not a
regression, I agree with including it in 3.1 in order to have proper
and correct configuration constraints for KRaft. You can proceed.

Cheers,
David

On Tue, Nov 23, 2021 at 2:55 PM Ron Dagostino  wrote:
>
> Hi David.  I would like to nominate
> https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13456
> "Tighten KRaft config checks/constraints" as a 3.1.0 blocker.  The
> existing configuration constraints/checks related to KRaft currently
> do not eliminate certain illegal configuration combinations. The
> illegal combinations do not cause harm at the moment, but we would
> like to implement constraints in 3.1.0 to catch them while KRaft is
> still in Preview.  We could add these additional checks later in 3.2.x
> instead, but we would like to add these as early as possible: we
> expect more people to begin trying KRaft with each subsequent release,
> and it would be best to eliminate as quickly as we can the possibility
> of people using configurations that would need fixing later.
>
> A patch is available at https://github.com/apache/kafka/pull/11503/.
>
> Ron
>
>
> On Tue, Nov 23, 2021 at 3:19 AM David Jacot  
> wrote:
> >
> > Hi Chris,
> >
> > Thanks for reporting both issues. As both are regressions, I do agree that
> > they are blockers and that we would fix them for 3.1.
> >
> > Cheers,
> > David
> >
> > On Mon, Nov 22, 2021 at 10:50 PM Chris Egerton
> >  wrote:
> > >
> > > Hi David,
> > >
> > > I have another blocker to propose. KAFKA-13472 (
> > > https://issues.apache.org/jira/browse/KAFKA-13472) is another regression 
> > > in
> > > Connect caused by recently-merged changes for KAFKA-12487 (
> > > https://issues.apache.org/jira/browse/KAFKA-12487) which can lead to data
> > > loss in sink connectors in some rare edge cases. I've opened a fix PR (
> > > https://github.com/apache/kafka/pull/11526) already, and have also opened 
> > > a
> > > fix PR (https://github.com/apache/kafka/pull/11524) for the aforementioned
> > > KAFKA-13469.
> > >
> > > Please let me know if we can merge a fix for this in time for the 3.1.0
> > > release; if not, as with KAFKA-13469, we may want to revert the changes 
> > > for
> > > the PR that cause this issue (in this case, that'd be the PR for
> > > KAFKA-12487).
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Nov 22, 2021 at 11:42 AM Chris Egerton  
> > > wrote:
> > >
> > > > Hi David,
> > > >
> > > > I'd like to propose KAFKA-13469 (
> > > > https://issues.apache.org/jira/browse/KAFKA-13469) as a blocker. It is a
> > > > regression in Connect caused by recently-merged changes for KAFKA-12226 
> > > > (
> > > > https://issues.apache.org/jira/browse/KAFKA-12226) which leads to
> > > > duplicate records for source tasks. I plan to have a fix PR opened by 
> > > > the
> > > > end of the day.
> > > >
> > > > Please let me know if we can merge a fix for this in time for the 3.1.0
> > > > release; if not, we may want to revert the changes for KAFKA-12226.
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Mon, Nov 15, 2021 at 5:02 AM David Jacot 
> > > > 
> > > > wrote:
> > > >
> > > >> Hi folks,
> > > >>
> > > >> We reached the code freeze for the Apache Kafka 3.1 release on Friday.
> > > >> Therefore,
> > > >> we will only accept blockers from now on.
> > > >>
> > > >> There already are a couple of blockers identified which were not
> > > >> completed before
> > > >> the code freeze. Please, raise any new blockers to this thread.
> > > >>
> > > >> For all the non-blocker issues targeting 3.1.0, I will move them to
> > > >> the next release.
> > > >>
> > > >> Cheers,
> > > >> David
> > > >>
> > > >> On Fri, Oct 29, 2021 at 12:20 PM Dongjin Lee  
> > > >> wrote:
> > > >> >
> > > >> > Hi David,
> > > >> >
> > > >> > Please update the components of the following KIPs:
> > > >> >
> > > >> > - KIP-390: Support Compression Level - Core, Clients
> > > >> > - KIP-653: Upgrade log4j to log4j2 - Clients, Connect, Core, Streams
> > > >> (that
> > > >> > is, Log4j-appender, Tools, and Trogdor are excluded.)
> > > >> >
> > > >> > Best,
> > > >> > Dongjin
> > > >> >
> > > >> > On Fri, Oct 29, 2021 at 2:24 AM Chris Egerton
> > > >> 
> > > >> > wrote:
> > > >> >
> > > >> > > Hi David,
> > > >> > >
> > > >> > > I've moved KIP-618 to the "postponed" section as it will not be
> > > >> merged in
> > > >> > > time due to lack of review.
> > > >> > >
> > > >> > > Cheers,
> > > >> > >
> > > >> > > Chris
> > > >> > >
> > > >> > > On Thu, Oct 28, 2021 at 1:07 PM David Jacot
> > > >> 
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi team,
> > > >> > > >
> > > >> > > > Just a quick reminder that the Feature freeze is tomorrow 
> > > >> > > > (October
> > > >> 29th).
> > > >> > > > In order to be fair with everyone in all the time zones, I plan 
> > > >> > > > to
> > > >> cut
> > > >> > > the
> > > >> > > > release branch early next week.
> > > >> > > >
> > > >> > > > Cheers,
> > > >> > > >

Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-11-23 Thread Ron Dagostino
Hi David.  I would like to nominate
https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13456
"Tighten KRaft config checks/constraints" as a 3.1.0 blocker.  The
existing configuration constraints/checks related to KRaft currently
do not eliminate certain illegal configuration combinations. The
illegal combinations do not cause harm at the moment, but we would
like to implement constraints in 3.1.0 to catch them while KRaft is
still in Preview.  We could add these additional checks later in 3.2.x
instead, but we would like to add these as early as possible: we
expect more people to begin trying KRaft with each subsequent release,
and it would be best to eliminate as quickly as we can the possibility
of people using configurations that would need fixing later.

A patch is available at https://github.com/apache/kafka/pull/11503/.

Ron


On Tue, Nov 23, 2021 at 3:19 AM David Jacot  wrote:
>
> Hi Chris,
>
> Thanks for reporting both issues. As both are regressions, I do agree that
> they are blockers and that we would fix them for 3.1.
>
> Cheers,
> David
>
> On Mon, Nov 22, 2021 at 10:50 PM Chris Egerton
>  wrote:
> >
> > Hi David,
> >
> > I have another blocker to propose. KAFKA-13472 (
> > https://issues.apache.org/jira/browse/KAFKA-13472) is another regression in
> > Connect caused by recently-merged changes for KAFKA-12487 (
> > https://issues.apache.org/jira/browse/KAFKA-12487) which can lead to data
> > loss in sink connectors in some rare edge cases. I've opened a fix PR (
> > https://github.com/apache/kafka/pull/11526) already, and have also opened a
> > fix PR (https://github.com/apache/kafka/pull/11524) for the aforementioned
> > KAFKA-13469.
> >
> > Please let me know if we can merge a fix for this in time for the 3.1.0
> > release; if not, as with KAFKA-13469, we may want to revert the changes for
> > the PR that cause this issue (in this case, that'd be the PR for
> > KAFKA-12487).
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Nov 22, 2021 at 11:42 AM Chris Egerton  wrote:
> >
> > > Hi David,
> > >
> > > I'd like to propose KAFKA-13469 (
> > > https://issues.apache.org/jira/browse/KAFKA-13469) as a blocker. It is a
> > > regression in Connect caused by recently-merged changes for KAFKA-12226 (
> > > https://issues.apache.org/jira/browse/KAFKA-12226) which leads to
> > > duplicate records for source tasks. I plan to have a fix PR opened by the
> > > end of the day.
> > >
> > > Please let me know if we can merge a fix for this in time for the 3.1.0
> > > release; if not, we may want to revert the changes for KAFKA-12226.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Nov 15, 2021 at 5:02 AM David Jacot 
> > > wrote:
> > >
> > >> Hi folks,
> > >>
> > >> We reached the code freeze for the Apache Kafka 3.1 release on Friday.
> > >> Therefore,
> > >> we will only accept blockers from now on.
> > >>
> > >> There already are a couple of blockers identified which were not
> > >> completed before
> > >> the code freeze. Please, raise any new blockers to this thread.
> > >>
> > >> For all the non-blocker issues targeting 3.1.0, I will move them to
> > >> the next release.
> > >>
> > >> Cheers,
> > >> David
> > >>
> > >> On Fri, Oct 29, 2021 at 12:20 PM Dongjin Lee  wrote:
> > >> >
> > >> > Hi David,
> > >> >
> > >> > Please update the components of the following KIPs:
> > >> >
> > >> > - KIP-390: Support Compression Level - Core, Clients
> > >> > - KIP-653: Upgrade log4j to log4j2 - Clients, Connect, Core, Streams
> > >> (that
> > >> > is, Log4j-appender, Tools, and Trogdor are excluded.)
> > >> >
> > >> > Best,
> > >> > Dongjin
> > >> >
> > >> > On Fri, Oct 29, 2021 at 2:24 AM Chris Egerton
> > >> 
> > >> > wrote:
> > >> >
> > >> > > Hi David,
> > >> > >
> > >> > > I've moved KIP-618 to the "postponed" section as it will not be
> > >> merged in
> > >> > > time due to lack of review.
> > >> > >
> > >> > > Cheers,
> > >> > >
> > >> > > Chris
> > >> > >
> > >> > > On Thu, Oct 28, 2021 at 1:07 PM David Jacot
> > >> 
> > >> > > wrote:
> > >> > >
> > >> > > > Hi team,
> > >> > > >
> > >> > > > Just a quick reminder that the Feature freeze is tomorrow (October
> > >> 29th).
> > >> > > > In order to be fair with everyone in all the time zones, I plan to
> > >> cut
> > >> > > the
> > >> > > > release branch early next week.
> > >> > > >
> > >> > > > Cheers,
> > >> > > > David
> > >> > > >
> > >> > > > On Mon, Oct 18, 2021 at 9:56 AM David Jacot 
> > >> wrote:
> > >> > > >
> > >> > > > > Hi team,
> > >> > > > >
> > >> > > > > KIP freeze for the next major release of Apache Kafka was reached
> > >> > > > > last week.
> > >> > > > >
> > >> > > > > I have updated the release plan with all the adopted KIPs which
> > >> are
> > >> > > > > considered
> > >> > > > > for AK 3.1.0. Please, verify the plan and let me know if any KIP
> > >> should
> > >> > > > be
> > >> > > > > added
> > >> > > > > to or removed from the release plan.
> > >> > > > >
> > >> > > > > For the KIPs which are still in progress, plea

Re: [VOTE] KIP-780: Support fine-grained compression options

2021-11-23 Thread Dongjin Lee
Bumping up the voting thread. As of present:

- binding: 0
- non-binding: 1 (Luke)

Thanks,
Dongjin

On Wed, Oct 27, 2021 at 9:47 PM Luke Chen  wrote:

> Hi Dongjin,
> Thanks for the KIP.
> +1 (non-binding)
>
> Luke
>
> On Wed, Oct 27, 2021 at 8:44 PM Dongjin Lee  wrote:
>
> > Bumping up the voting thread.
> >
> > If you have any questions or opinions, don't hesitate to leave them in
> the
> > discussion thread.
> >
> > Best,
> > Dongjin
> >
> > On Thu, Oct 14, 2021 at 3:02 AM Dongjin Lee  wrote:
> >
> > > Hi, Kafka dev,
> > >
> > > I'd like to open a vote for KIP-780: Support fine-grained compression
> > > options:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> > >
> > > Please note that this feature mutually complements KIP-390: Support
> > > Compression Level (accepted, targeted to 3.1.0.). It was initially
> > planned
> > > for a part of KIP-390 but spun off for performance concerns.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> > >
> > > Best,
> > > Dongjin
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > >
> > >
> > > *github:  github.com/dongjinleekr
> > > keybase:
> > https://keybase.io/dongjinleekr
> > > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > speakerdeck:
> > speakerdeck.com/dongjin
> > > *
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> >
> >
> > *github:  github.com/dongjinleekr
> > keybase:
> https://keybase.io/dongjinleekr
> > linkedin:
> kr.linkedin.com/in/dongjinleekr
> > speakerdeck:
> > speakerdeck.com/dongjin
> > *
> >
>
-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*



*github:  github.com/dongjinleekr
keybase: https://keybase.io/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*


Re: [DISCUSS] KIP-719: Add Log4J2 Appender

2021-11-23 Thread Dongjin Lee
Hi Mickael,

I also thought over the issue thoroughly and would like to propose a minor
change to your proposal:

1. Deprecate log4j-appender now
2. Document how to migrate into logging-log4j2
3. (Changed) Replace the log4j-appender (in turn log4j 1.x) dependencies in
tools, trogdor, and shell and upgrade to log4j2 in 3.x, removing log4j 1.x
dependencies.
4. (Changed) Remove log4j-appender in Kafka 4.0

What we need to do for the log4j2 upgrade is just removing the log4j
dependencies only, for they can cause a classpath error. And actually, we
can do it without discontinuing publishing the log4j-appender artifact. So,
I suggest separating the upgrade to log4j2 and removing the log4j-appender
module.

How do you think? If you agree, I will update the KIP and the PR
accordingly ASAP.

Thanks,
Dongjin

On Mon, Nov 15, 2021 at 8:06 PM Mickael Maison 
wrote:

> Hi Dongjin,
>
> Thanks for the clarifications.
>
> I wonder if a simpler course of action could be:
> - Deprecate log4j-appender now
> - Document how to use logging-log4j2
> - Remove log4j-appender and all the log4j dependencies in Kafka 4.0
>
> This delays KIP-653 till Kafka 4.0 but (so far) Kafka is not directly
> affected by the log4j CVEs. At least this gives us a clear and simple
> roadmap to follow.
>
> What do you think?
>
> On Tue, Nov 9, 2021 at 12:12 PM Dongjin Lee  wrote:
> >
> > Hi Mickael,
> >
> > I greatly appreciate you for reading the proposal so carefully! I wrote
> it
> > quite a while ago and rechecked it today.
> >
> > > Is the KIP proposing to replace the existing log4-appender or simply
> add
> > a new one for log4j2? Reading the KIP and with its current title, it's
> not
> > entirely explicit.
> >
> > Oh, After re-reading it, I realized that this is not clear. Let me
> clarify;
> >
> > 1. Provide a lo4j2 equivalent of traditional log4j-appender,
> > log4j2-appender.
> > 2. Migrate the modules depending on log4j-appender (i.e., tools, trogdor,
> > shell) into log4j2-appender, removing log4j-appender from dependencies.
> > 3. Entirely remove log4j-appender from the project dependencies, along
> with
> > log4j.
> >
> > I think log4j-appender may be published for every new release like
> before,
> > but the committee should make a decision on the policy.
> >
> > > Under Rejected Alternative, the KIP states: "the Kafka appender
> provided
> > by log4j2 community stores log message in the Record key". Looking at the
> > code, it looks like the log message is stored in the Record value:
> >
> https://github.com/apache/logging-log4j2/blob/master/log4j-kafka/src/main/java/org/apache/logging/log4j/kafka/appender/KafkaManager.java#L135
> > Am I missing something?
> >
> > It's totally my fault; I confused it with another appender. The
> > compatibility problem in the logging-log4j2 Kafka appender is not the
> > format but the configuration. logging-log4j2 Kafka appender supports
> > `properties` configuration, which will be directly used to instantiate a
> > Kafka producer. However, log4j-appender has been using non-producer
> config
> > names like brokerList (=bootstrap.servers), requiredNumAcks (=acks).
> > Instead, logging-log4j2 Kafka appender supports retryCount,
> > sendEventTimestamp.
> >
> > On second thought, using logging-log4j2 Kafka appender internally and
> > making log4j2-appender to focus on compatibility facade only would be a
> > better approach; As I described above, the goal of this module is just
> > keeping the backward-compatibility, and (as you pointed out) the current
> > implementation has little value. Since
> org.apache.logging.log4j:log4j-core
> > already includes Kafka appender, we can make use of the 'proven wheel'
> > without adding more dependencies. I have not tried it yet, but I think it
> > is well worth it. (One additional advantage of this approach is
> providing a
> > bridge to the users who hope to move from/into logging-log4j2 Kafka
> > appender.)
> >
> > > As the current log4j-appender is not even deprecated yet, in theory we
> > can't remove it till Kafka 4. If we want to speed up the process, I
> wonder
> > if the lack of documentation and a migration guide could help us. What do
> > you think?
> >
> > In fact, this is what I am doing nowadays. While working with
> > log4j-appender, I found that despite a lack of documentation,
> considerable
> > users are already using it[^1][^2][^3][^4][^5]. So, I think providing a
> > documentation to those who are already using log4j-appender is
> > indispensable. It should include:
> >
> > - What is the difference between log4j-appender vs. log4j2-appender.
> > - Which options are supported and deprecated.
> > - Exemplar configurations that show how to migrate.
> >
> > Here is the summary:
> >
> > 1. The goal of this proposal is to replace the traditional log4j-appender
> > for compatibility concerns. But log4j-appender may be published after the
> > deprecation.
> > 2. As of present, the description about logging-log4j2 Kafka appender is
> > entirely wrong. The 

Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-11-23 Thread David Jacot
Hi Chris,

Thanks for reporting both issues. As both are regressions, I do agree that
they are blockers and that we would fix them for 3.1.

Cheers,
David

On Mon, Nov 22, 2021 at 10:50 PM Chris Egerton
 wrote:
>
> Hi David,
>
> I have another blocker to propose. KAFKA-13472 (
> https://issues.apache.org/jira/browse/KAFKA-13472) is another regression in
> Connect caused by recently-merged changes for KAFKA-12487 (
> https://issues.apache.org/jira/browse/KAFKA-12487) which can lead to data
> loss in sink connectors in some rare edge cases. I've opened a fix PR (
> https://github.com/apache/kafka/pull/11526) already, and have also opened a
> fix PR (https://github.com/apache/kafka/pull/11524) for the aforementioned
> KAFKA-13469.
>
> Please let me know if we can merge a fix for this in time for the 3.1.0
> release; if not, as with KAFKA-13469, we may want to revert the changes for
> the PR that cause this issue (in this case, that'd be the PR for
> KAFKA-12487).
>
> Cheers,
>
> Chris
>
> On Mon, Nov 22, 2021 at 11:42 AM Chris Egerton  wrote:
>
> > Hi David,
> >
> > I'd like to propose KAFKA-13469 (
> > https://issues.apache.org/jira/browse/KAFKA-13469) as a blocker. It is a
> > regression in Connect caused by recently-merged changes for KAFKA-12226 (
> > https://issues.apache.org/jira/browse/KAFKA-12226) which leads to
> > duplicate records for source tasks. I plan to have a fix PR opened by the
> > end of the day.
> >
> > Please let me know if we can merge a fix for this in time for the 3.1.0
> > release; if not, we may want to revert the changes for KAFKA-12226.
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Nov 15, 2021 at 5:02 AM David Jacot 
> > wrote:
> >
> >> Hi folks,
> >>
> >> We reached the code freeze for the Apache Kafka 3.1 release on Friday.
> >> Therefore,
> >> we will only accept blockers from now on.
> >>
> >> There already are a couple of blockers identified which were not
> >> completed before
> >> the code freeze. Please, raise any new blockers to this thread.
> >>
> >> For all the non-blocker issues targeting 3.1.0, I will move them to
> >> the next release.
> >>
> >> Cheers,
> >> David
> >>
> >> On Fri, Oct 29, 2021 at 12:20 PM Dongjin Lee  wrote:
> >> >
> >> > Hi David,
> >> >
> >> > Please update the components of the following KIPs:
> >> >
> >> > - KIP-390: Support Compression Level - Core, Clients
> >> > - KIP-653: Upgrade log4j to log4j2 - Clients, Connect, Core, Streams
> >> (that
> >> > is, Log4j-appender, Tools, and Trogdor are excluded.)
> >> >
> >> > Best,
> >> > Dongjin
> >> >
> >> > On Fri, Oct 29, 2021 at 2:24 AM Chris Egerton
> >> 
> >> > wrote:
> >> >
> >> > > Hi David,
> >> > >
> >> > > I've moved KIP-618 to the "postponed" section as it will not be
> >> merged in
> >> > > time due to lack of review.
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Chris
> >> > >
> >> > > On Thu, Oct 28, 2021 at 1:07 PM David Jacot
> >> 
> >> > > wrote:
> >> > >
> >> > > > Hi team,
> >> > > >
> >> > > > Just a quick reminder that the Feature freeze is tomorrow (October
> >> 29th).
> >> > > > In order to be fair with everyone in all the time zones, I plan to
> >> cut
> >> > > the
> >> > > > release branch early next week.
> >> > > >
> >> > > > Cheers,
> >> > > > David
> >> > > >
> >> > > > On Mon, Oct 18, 2021 at 9:56 AM David Jacot 
> >> wrote:
> >> > > >
> >> > > > > Hi team,
> >> > > > >
> >> > > > > KIP freeze for the next major release of Apache Kafka was reached
> >> > > > > last week.
> >> > > > >
> >> > > > > I have updated the release plan with all the adopted KIPs which
> >> are
> >> > > > > considered
> >> > > > > for AK 3.1.0. Please, verify the plan and let me know if any KIP
> >> should
> >> > > > be
> >> > > > > added
> >> > > > > to or removed from the release plan.
> >> > > > >
> >> > > > > For the KIPs which are still in progress, please work closely
> >> with your
> >> > > > > reviewers
> >> > > > > to make sure that they land on time for the feature freeze.
> >> > > > >
> >> > > > > The next milestone for the AK 3.1.0 release is the feature freeze
> >> on
> >> > > > > October 29th,
> >> > > > > 2021.
> >> > > > >
> >> > > > > Cheers,
> >> > > > > David
> >> > > > >
> >> > > > > On Fri, Oct 15, 2021 at 9:05 AM David Jacot 
> >> > > wrote:
> >> > > > >
> >> > > > >> Hi folks,
> >> > > > >>
> >> > > > >> Just a quick reminder that the KIP freeze is today. Don't forget
> >> to
> >> > > > close
> >> > > > >> your ongoing votes.
> >> > > > >>
> >> > > > >> Best,
> >> > > > >> David
> >> > > > >>
> >> > > > >> On Thu, Oct 14, 2021 at 5:31 PM David Jacot  >> >
> >> > > > wrote:
> >> > > > >>
> >> > > > >>> Hi Luke,
> >> > > > >>>
> >> > > > >>> Added it to the plan.
> >> > > > >>>
> >> > > > >>> Thanks,
> >> > > > >>> David
> >> > > > >>>
> >> > > > >>> On Thu, Oct 14, 2021 at 10:09 AM Luke Chen 
> >> > > wrote:
> >> > > > >>>
> >> > > >  Hi David,
> >> > > >  KIP-766 is merged into trunk. Please help add it into the
> >> release
> >> > > > plan.
> >> > > > 
> >> > > > >

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-11-23 Thread Luke Chen
Hi Tom,
Thanks for your comments. And thanks for Artem's explanation.
Below is my response:

> Currently because buffers are allocated using batch.size it means we can
handle records that are that large (e.g. one big record per batch). Doesn't
the introduction of smaller buffer sizes (batch.initial.size) mean a
corresponding decrease in the maximum record size that the producer can
handle?

Actually, the "batch.size" is only like a threshold to decide if the batch
is "ready to be sent". That is, even if you set the "batch.size=16KB"
(default value), users can still send one record sized with 20KB, as long
as the size is less than "max.request.size" in producer (default 1MB).
Therefore, the introduction of "batch.initial.size" won't decrease the
maximum record size that the producer can handle.

> But isn't there the risk that drainBatchesForOneNode would end up not
sending ready
batches well past when they ought to be sent (according to their linger.ms),
because it's sending buffers for earlier partitions too aggressively?

Did you mean that we have a "max.request.size" per request (default is
1MB), and before this KIP, the request can include 64 batches in single
request ["batch.size"(16KB) * 64 = 1MB], but now, we might be able to
include 32 batches or less, because we aggressively sent more records in
one batch, is that what you meant? That's a really good point that I've
never thought about. I think your suggestion to go through other partitions
that just fit "batch.size", or expire "linger.ms" first, before handling
the one that is > "batch.size" limit is not a good way, because it might
cause the one with size > "batch.size" always in the lowest priority, and
cause starving issue that the batch won't have chance to get sent.

I don't have better solution for it, but maybe I can firstly decrease the
"batch.max.size" to 32KB, instead of aggressively 256KB in the KIP. That
should alleviate the problem. And still improve the throughput. What do you
think?

Thank you.
Luke

On Tue, Nov 23, 2021 at 9:04 AM Artem Livshits
 wrote:

> > I think this KIP would change the behaviour of producers when there are
> multiple partitions ready to be sent
>
> This is correct, the pattern changes and becomes more coarse-grained.  But
> I don't think it changes fairness over the long run.  I think it's a good
> idea to change drainIndex to be random rather than round robin to avoid
> forming patterns where some partitions would consistently get higher
> latencies than others because they wait longer for their turn.
>
> If we really wanted to preserve the exact patterns, we could either try to
> support multiple 16KB batches from one partition per request (probably
> would require protocol change to change logic on the broker for duplicate
> detection) or try to re-batch 16KB batches from accumulator into larger
> batches during send (additional computations) or try to consider all
> partitions assigned to a broker to check if a new batch needs to be created
> (i.e. compare cumulative batch size from all partitions assigned to a
> broker and create new batch when cumulative size is 1MB, more complex).
>
> Overall, it seems like just increasing the max batch size is a simpler
> solution and it does favor larger batch sizes, which is beneficial not just
> for production.
>
> > ready batches well past when they ought to be sent (according to their
> linger.ms)
>
> The trigger for marking batches ready to be sent isn't changed - a batch is
> ready to be sent once it reaches 16KB, so by the time larger batches start
> forming, linger.ms wouldn't matter much because the batching goal is met
> and the batch can be sent immediately.  Larger batches start forming once
> the client starts waiting for the server, in which case some data will wait
> its turn to be sent.  This will happen for some data regardless of how we
> pick data to send, the question is just whether we'd have some scenarios
> where some partitions would consistently experience higher latency than
> others.  I think picking drainIndex randomly would prevent such scenarios.
>
> -Artem
>
> On Mon, Nov 22, 2021 at 2:28 AM Tom Bentley  wrote:
>
> > Hi Luke,
> >
> > Thanks for the KIP!
> >
> > Currently because buffers are allocated using batch.size it means we can
> > handle records that are that large (e.g. one big record per batch).
> Doesn't
> > the introduction of smaller buffer sizes (batch.initial.size) mean a
> > corresponding decrease in the maximum record size that the producer can
> > handle? That might not be a problem if the user knows their maximum
> record
> > size and has tuned batch.initial.size accordingly, but if the default for
> > batch.initial.size < batch.size it could cause regressions for existing
> > users with a large record size, I think. It should be enough for
> > batch.initial.size to default to batch.size, allowing users who care
> about
> > the memory saving in the off-peak throughput case to do the tuning, but
> not
> > causing a regr