[jira] [Created] (KAFKA-16391) Cleanup .lock file after server is down

2024-03-19 Thread PoAn Yang (Jira)
PoAn Yang created KAFKA-16391:
-

 Summary: Cleanup .lock file after server is down
 Key: KAFKA-16391
 URL: https://issues.apache.org/jira/browse/KAFKA-16391
 Project: Kafka
  Issue Type: Improvement
Reporter: PoAn Yang
Assignee: PoAn Yang


Currently, server adds a `.lock` file to each log folder. The file is useless 
after server is down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16390) consumer_bench_test.py failed using AsyncKafkaConsumer

2024-03-19 Thread Philip Nee (Jira)
Philip Nee created KAFKA-16390:
--

 Summary: consumer_bench_test.py failed using AsyncKafkaConsumer
 Key: KAFKA-16390
 URL: https://issues.apache.org/jira/browse/KAFKA-16390
 Project: Kafka
  Issue Type: Task
  Components: consumer, system tests
Reporter: Philip Nee


Ran the system test based on KAFKA-16273

The following tests failed using the consumer group protocol
{code:java}
kafkatest.tests.core.consume_bench_test.ConsumeBenchTest.test_consume_bench.topics=.consume_bench_topic.0-5.0-4.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer

kafkatest.tests.core.consume_bench_test.ConsumeBenchTest.test_multiple_consumers_random_group_partitions.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer

kafkatest.tests.core.consume_bench_test.ConsumeBenchTest.test_single_partition.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
 {code}
Because of
{code:java}
 TimeoutError('consume_workload failed to finish in the expected amount of 
time.')
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
    data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/core/consume_bench_test.py", line 
146, in test_single_partition
    consume_workload.wait_for_done(timeout_sec=180)
  File "/opt/kafka-dev/tests/kafkatest/services/trogdor/trogdor.py", line 352, 
in wait_for_done
    wait_until(lambda: self.done(),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: consume_workload failed to finish in the expected 
amount of time. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-12217) Apply the new features of Junit 5.8 to code base

2024-03-19 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-12217.

Resolution: Fixed

> Apply the new features of Junit 5.8 to code base
> 
>
> Key: KAFKA-12217
> URL: https://issues.apache.org/jira/browse/KAFKA-12217
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>
> There are two useful new features of Junit 5.8.
> 1. assertInstanceOf (https://github.com/junit-team/junit5/pull/2499)
> It offers more meaningful error message than "assertTrue(obj instanceof X) "
> 2. junit.jupiter.params.displayname.default 
> (https://github.com/junit-team/junit5/pull/2532)
> It offers the default display name for all parameterized tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-12187) replace assertTrue(obj instanceof X) by assertInstanceOf when we update to JUnit 5.8

2024-03-19 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-12187.

Fix Version/s: 3.8.0
   Resolution: Fixed

> replace assertTrue(obj instanceof X) by assertInstanceOf when we update to 
> JUnit 5.8
> 
>
> Key: KAFKA-12187
> URL: https://issues.apache.org/jira/browse/KAFKA-12187
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Kuan Po Tseng
>Priority: Minor
> Fix For: 3.8.0
>
>
> see [https://github.com/apache/kafka/pull/9874#discussion_r556547909]
>  
> {quote}Yeah, for existing code improvements (versus code introduced by this 
> change), let's do it via a different PR. For this particular issue, we can 
> probably wait for JUnit 5.8 and use:
> {quote}
> * New assertInstanceOf methods as a replacement for assertTrue(obj instanceof 
> X) which provide better error messages comparable to those of assertThrows.
>  related PR: https://github.com/junit-team/junit5/pull/2499



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16389) consumer_test.py’s test_valid_assignment fails with new consumer

2024-03-19 Thread Kirk True (Jira)
Kirk True created KAFKA-16389:
-

 Summary: consumer_test.py’s test_valid_assignment fails with new 
consumer
 Key: KAFKA-16389
 URL: https://issues.apache.org/jira/browse/KAFKA-16389
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


The following error is reported when running the {{test_valid_assignment}} test 
from {{consumer_test.py}}:

 {code}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
584, in test_valid_assignment
wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
consumer.current_assignment()),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])]
{code}

To reproduce, create a system test suite file named 
{{test_valid_assignment.yml}} with these contents:

{code:yaml}
failures:
  - 
'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}'
{code}

Then run set the the {{TC_PATHS}} environment variable to include that test 
suite file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16367) Full ConsumerGroupHeartbeat response must be sent when full request is received

2024-03-19 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16367.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Full ConsumerGroupHeartbeat response must be sent when full request is 
> received
> ---
>
> Key: KAFKA-16367
> URL: https://issues.apache.org/jira/browse/KAFKA-16367
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #114

2024-03-19 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16388) add production-ready test of 3.3 - 3.6 release to MetadataVersionTest.testFromVersionString

2024-03-19 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16388:
--

 Summary: add production-ready test of 3.3 - 3.6 release to 
MetadataVersionTest.testFromVersionString
 Key: KAFKA-16388
 URL: https://issues.apache.org/jira/browse/KAFKA-16388
 Project: Kafka
  Issue Type: Test
Reporter: Chia-Ping Tsai


https://github.com/apache/kafka/blob/trunk/server-common/src/test/java/org/apache/kafka/server/common/MetadataVersionTest.java#L169

we have already released 3.3 ~ 3.6, and so they should be included by 
MetadataVersionTest.testFromVersionString

{code:java}
assertEquals(IBP_3_3_IV3, MetadataVersion.fromVersionString("3.3"));
assertEquals(IBP_3_4_IV0, MetadataVersion.fromVersionString("3.4"));
assertEquals(IBP_3_5_IV2, MetadataVersion.fromVersionString("3.5"));
assertEquals(IBP_3_6_IV2, MetadataVersion.fromVersionString("3.6"));
{code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-1025: Optionally URL-encode clientID and clientSecret in authorization header

2024-03-19 Thread Nelson B.
Hi all,

I would like to start a vote on KIP-1025
,
which would optionally URL-encode clientID and clientSecret in the
authorization header.

I feel like all possible issues have been addressed in the discussion
thread.

Thanks,


Re: [DISCUSS] KIP-1025: Optionally URL-encode clientID and clientSecret in authorization header

2024-03-19 Thread Kirk True
Hi Nelson,

Piggybacking on KIP-1030 seems like the perfect solution.

The configuration name change sounds good, too.

Thanks,
Kirk

> On Mar 18, 2024, at 2:21 PM, Nelson B.  wrote:
> 
> Hi Kirk,
> 
> Thanks for your comments!
> 
> 1. I think we can use KIP-1030
> 
> as
> an opportunity to update the default value to "true" starting from version
> 4.0.
> 2. I've updated the config name to "sasl.oauthbearer.header.urlencode" in
> the KIP, I'm gonna update PR once KIP is accepted.
> 
> Thanks,
> 
> On Tue, Mar 19, 2024 at 3:45 AM Kirk True  wrote:
> 
>> Hi Nelson,
>> 
>> Thank you for writing up the KIP! My apologies for the delay in response :(
>> 
>> Questions:
>> 
>> 1. Is the long-term plan to keep the configuration default set to “false"?
>> I understand the short-term benefits, but in general, configuration
>> defaults should prefer compliance with standards (e.g. RFCs).
>> 2. Can we change “sasl.oauthbearer.header.urlencode.enable” to be a little
>> shorter? Maybe “sasl.oauthbearer.header.urlencode” or even
>> “sasl.oauthbearer.urlencode”? I’m looking at the configuration names that I
>> introduced in KIP-768 with a bit of cringe at their length :) This is a
>> total nit, so I won’t make a stink about it if everyone else is cool with
>> it :)
>> 
>> Thanks,
>> Kirk
>> 
>>> On Mar 13, 2024, at 5:31 AM, Nelson B.  wrote:
>>> 
>>> Hi all,
>>> 
>>> I just wanted to bump up this thread.
>>> 
>>> The KIP introduces a really small change and PR is already ready and only
>>> waiting for this KIP to get approved to be merged.
>>> 
>>> Thanks,
>>> 
>>> On Wed, Mar 6, 2024 at 12:26 PM Nelson B. 
>> wrote:
>>> 
 Hi all,
 
 I would like to start a discussion on KIP-1025, which would optionally
 URL-encode clientID and clientSecret in the authorization header
 
 
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1025%3A+Optionally+URL-encode+clientID+and+clientSecret+in+authorization+header
 
 Best,
 Nelson B.
 
>> 
>> 



[jira] [Created] (KAFKA-16387) Allow kafka-metadata-shell to read a running server metadata

2024-03-19 Thread PoAn Yang (Jira)
PoAn Yang created KAFKA-16387:
-

 Summary: Allow kafka-metadata-shell to read a running server 
metadata
 Key: KAFKA-16387
 URL: https://issues.apache.org/jira/browse/KAFKA-16387
 Project: Kafka
  Issue Type: Improvement
Reporter: PoAn Yang
Assignee: PoAn Yang


Currently, kafka-metadata-shell tries to get the file lock before reading the 
data, so it can't read running server metadata.


If users don't want to read the latest data, kafka-metadata-shell can provide 
an option to copy the data to another place and only read the copied data. In 
this case, kafka-metadata-shell can work without shutting down the server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16386) NETWORK_EXCEPTIONs from transaction verification are not translated

2024-03-19 Thread Sean Quah (Jira)
Sean Quah created KAFKA-16386:
-

 Summary: NETWORK_EXCEPTIONs from transaction verification are not 
translated
 Key: KAFKA-16386
 URL: https://issues.apache.org/jira/browse/KAFKA-16386
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Sean Quah


KAFKA-14402 
([KIP-890|https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense])
 adds verification with the transaction coordinator on Produce and 
TxnOffsetCommit paths as a defense against hanging transactions. For 
compatibility with older clients, retriable errors from the verification step 
are translated to ones already expected and handled by existing clients. When 
verification was added, we forgot to translate {{NETWORK_EXCEPTION}} s.

[~dajac] noticed this manifesting as a test failure when 
tests/kafkatest/tests/core/transactions_test.py was run with an older client 
(pre KAFKA-16122):
{quote}
{{NETWORK_EXCEPTION}} is indeed returned as a partition error. The 
{{TransactionManager.TxnOffsetCommitHandler}} considers it as a fatal error so 
it transitions to the fatal state.
It seems that there are two cases where the server could return it: (1) When 
the verification request times out or its connections is cut; or (2) in 
{{AddPartitionsToTxnManager.addTxnData}} where we say that we use it because we 
want a retriable error.
{quote}

The first case was triggered as part of the test. The second case happens when 
there is already a verification request ({{AddPartitionsToTxn}}) in flight with 
the same epoch and we want clients to try again when we're not busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Minimum constraint for segment.ms

2024-03-19 Thread Doğuşcan Namal
Hi all,

There are also message.max.bytes, replica.fetch.max.bytes and their
derivatives requires a constraint on their maximum value as the maximum
total memory on the instance. Otherwise, these could cause out of memory
errors on the instance.

Do you think this is in scope here as well?

On Thu, 14 Mar 2024 at 10:29, Haruki Okada  wrote:

> Hi, Divij.
>
> This isn't about config default value/constraint change though, I found
> there's a behavior discrepancy in max.block.ms config, which may cause
> breaking change if we change the behavior.
> The detail is described in the ticket:
> https://issues.apache.org/jira/browse/KAFKA-16372
>
> What do you think?
>
> 2024年3月14日(木) 13:09 Kamal Chandraprakash :
>
> > One use case I see for setting the `segment.bytes` to 1 is to delete all
> > the records from the topic.
> > We can mention about it in the doc to use the `kafka-delete-records` API
> > instead.
> >
> >
> >
> >
> > On Wed, Mar 13, 2024 at 6:59 PM Divij Vaidya 
> > wrote:
> >
> > > + users@kafka
> > >
> > > Hi users of Apache Kafka
> > >
> > > With the upcoming 4.0 release, we have an opportunity to improve the
> > > constraints and default values for various Kafka configurations.
> > >
> > > We are soliciting your feedback and suggestions on configurations where
> > the
> > > default values and/or constraints should be adjusted. Please reply in
> > this
> > > thread directly.
> > >
> > > --
> > > Divij Vaidya
> > > Apache Kafka PMC
> > >
> > >
> > >
> > > On Wed, Mar 13, 2024 at 12:56 PM Divij Vaidya  >
> > > wrote:
> > >
> > > > Thanks for the discussion folks. I have started a KIP
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations
> > > > to keep track of the changes that we are discussion. Please consider
> > this
> > > > as a collaborative work-in-progress KIP and once it is ready to be
> > > > published, we can start a discussion thread on it.
> > > >
> > > > I am also going to start a thread to solicit feedback from users@
> > > mailing
> > > > list as well.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Wed, Mar 13, 2024 at 12:55 PM Christopher Shannon <
> > > > christopher.l.shan...@gmail.com> wrote:
> > > >
> > > >> I think it's a great idea to raise a KIP to look at adjusting
> defaults
> > > and
> > > >> minimum/maximum config values for version 4.0.
> > > >>
> > > >> As pointed out, the minimum values for segment.ms and segment.bytes
> > > don't
> > > >> make sense and would probably bring down a cluster pretty quickly if
> > set
> > > >> that low, so version 4.0 is a good time to fix it and to also look
> at
> > > the
> > > >> other configs as well for adjustments.
> > > >>
> > > >> On Wed, Mar 13, 2024 at 4:39 AM Sergio Daniel Troiano
> > > >>  wrote:
> > > >>
> > > >> > hey guys,
> > > >> >
> > > >> > Regarding to num.recovery.threads.per.data.dir: I agree, in our
> > > company
> > > >> we
> > > >> > use the number of vCPUs to do so as this is not competing with
> ready
> > > >> > cluster traffic.
> > > >> >
> > > >> >
> > > >> > On Wed, 13 Mar 2024 at 09:29, Luke Chen 
> wrote:
> > > >> >
> > > >> > > Hi Divij,
> > > >> > >
> > > >> > > Thanks for raising this.
> > > >> > > The valid minimum value 1 for `segment.ms` is completely
> > > >> unreasonable.
> > > >> > > Similarly for `segment.bytes`, `metadata.log.segment.ms`,
> > > >> > > `metadata.log.segment.bytes`.
> > > >> > >
> > > >> > > In addition to that, there are also some config default values
> > we'd
> > > >> like
> > > >> > to
> > > >> > > propose to change in v4.0.
> > > >> > > We can collect more comments from the community, and come out
> > with a
> > > >> KIP
> > > >> > > for them.
> > > >> > >
> > > >> > > 1. num.recovery.threads.per.data.dir:
> > > >> > > The current default value is 1. But the log recovery is
> happening
> > > >> before
> > > >> > > brokers are in ready state, which means, we should use all the
> > > >> available
> > > >> > > resource to speed up the log recovery to bring the broker to
> ready
> > > >> state
> > > >> > > soon. Default value should be... maybe 4 (to be decided)?
> > > >> > >
> > > >> > > 2. Other configs might be able to consider to change the
> default,
> > > but
> > > >> > open
> > > >> > > for comments:
> > > >> > >2.1. num.replica.fetchers: default is 1, but that's not
> enough
> > > when
> > > >> > > there are multiple partitions in the cluster
> > > >> > >2.2.
> `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`:
> > > >> > > Currently, we set 100kb as default value, but that's not enough
> > for
> > > >> > > high-speed network.
> > > >> > >
> > > >> > > Thank you.
> > > >> > > Luke
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya <
> > > divijvaidy...@gmail.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hey folks
> > > >> > > >
> > > >> > > > Before I file a KIP to change this in 4.0, I 

Re: [VOTE] KIP-956: Tiered Storage Quotas

2024-03-19 Thread Abhijeet Kumar
Hi All,

This KIP is accepted with 3 +1 binding votes(Jun, Satish, Luke) and 2 +1
non-binding votes(Kamal, Jorge).

Thank you all for voting.

Regards.
Abhijeet.



On Tue, Mar 19, 2024 at 3:35 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Abhjeet! Looking forward for this one.
> +1 (non-binding).
>
> On Thu, 14 Mar 2024 at 06:08, Luke Chen  wrote:
>
> > Thanks for the KIP!
> > +1 from me.
> >
> > Luke
> >
> > On Sun, Mar 10, 2024 at 8:44 AM Satish Duggana  >
> > wrote:
> >
> > > Thanks Abhijeet for the KIP, +1 from me.
> > >
> > >
> > > On Sat, 9 Mar 2024 at 1:51 AM, Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > +1 (non-binding), Thanks for the KIP, Abhijeet!
> > > >
> > > > --
> > > > Kamal
> > > >
> > > > On Fri, Mar 8, 2024 at 11:02 PM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the KIP. +1
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Mar 8, 2024 at 3:44 AM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to start the vote for KIP-956 - Tiered Storage
> Quotas
> > > > > >
> > > > > > The KIP is here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > > > >
> > > > > > Regards.
> > > > > > Abhijeet.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-956: Tiered Storage Quotas

2024-03-19 Thread Jorge Esteban Quilcate Otoya
Thanks Abhjeet! Looking forward for this one.
+1 (non-binding).

On Thu, 14 Mar 2024 at 06:08, Luke Chen  wrote:

> Thanks for the KIP!
> +1 from me.
>
> Luke
>
> On Sun, Mar 10, 2024 at 8:44 AM Satish Duggana 
> wrote:
>
> > Thanks Abhijeet for the KIP, +1 from me.
> >
> >
> > On Sat, 9 Mar 2024 at 1:51 AM, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > +1 (non-binding), Thanks for the KIP, Abhijeet!
> > >
> > > --
> > > Kamal
> > >
> > > On Fri, Mar 8, 2024 at 11:02 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the KIP. +1
> > > >
> > > > Jun
> > > >
> > > > On Fri, Mar 8, 2024 at 3:44 AM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start the vote for KIP-956 - Tiered Storage Quotas
> > > > >
> > > > > The KIP is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > > >
> > > > > Regards.
> > > > > Abhijeet.
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-19 Thread Jorge Esteban Quilcate Otoya
Sorry I missed that comment on the thread. Proposal looks great, thanks,
Abhijeet!

On Sat, 16 Mar 2024 at 13:19, Abhijeet Kumar 
wrote:

> Hi Jorge,
>
> The configs name was chosen to keep it consistent with the other existing
> quota configs, such as
> *replica.alter.log.dirs.io.max.bytes.per.second* as pointed out by Jun in
> the thread.
>
> Also, we can revisit the names of the components during implementation,
> since those are not exposed to the user.
>
> Please let me know if you have any further concerns.
>
> Regards,
> Abhijeet.
>
>
>
> On Mon, Mar 11, 2024 at 6:11 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the KIP! Looks good to me. I just have a minor comments on
> > naming:
> >
> > Would it be work to align the config names to existing quota names?
> > e.g. `remote.log.manager.copy.byte.rate.quota` (or similar) instead of
> > `remote.log.manager.copy.max.bytes.per.second`?
> >
> > Same for new components, could we use the same verbs as in the configs:
> > - RLMCopyQuotaManager
> > - RLMFetchQuotaManager
> >
> >
> > On Fri, 8 Mar 2024 at 13:43, Abhijeet Kumar 
> > wrote:
> >
> > > Thank you all for your comments. As all the comments in the thread are
> > > addressed, I am starting a Vote thread for the KIP. Please have a look.
> > >
> > > Regards.
> > >
> > > On Thu, Mar 7, 2024 at 12:34 PM Luke Chen  wrote:
> > >
> > > > Hi Abhijeet,
> > > >
> > > > Thanks for the update and the explanation.
> > > > I had another look, and it LGTM now!
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > > On Tue, Mar 5, 2024 at 2:50 AM Jun Rao 
> > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the reply. Sounds good to me.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Sat, Mar 2, 2024 at 7:40 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for pointing it out. It makes sense to me. We can have the
> > > > > following
> > > > > > metrics instead. What do you think?
> > > > > >
> > > > > >- remote-(fetch|copy)-throttle-time-avg (The average time in
> ms
> > > > remote
> > > > > >fetches/copies was throttled by a broker)
> > > > > >- remote-(fetch|copy)-throttle-time--max (The maximum time in
> ms
> > > > > remote
> > > > > >fetches/copies was throttled by a broker)
> > > > > >
> > > > > > These are similar to fetch-throttle-time-avg and
> > > > fetch-throttle-time-max
> > > > > > metrics we have for Kafak Consumers?
> > > > > > The Avg and Max are computed over the (sliding) window as defined
> > by
> > > > the
> > > > > > configuration metrics.sample.window.ms on the server.
> > > > > >
> > > > > > (Also, I will update the config and metric names to be
> consistent)
> > > > > >
> > > > > > Regards.
> > > > > >
> > > > > > On Thu, Feb 29, 2024 at 2:51 AM Jun Rao  >
> > > > > wrote:
> > > > > >
> > > > > > > Hi, Abhijeet,
> > > > > > >
> > > > > > > Thanks for the reply.
> > > > > > >
> > > > > > > The issue with recording the throttle time as a gauge is that
> > it's
> > > > > > > transient. If the metric is not read immediately, the recorded
> > > value
> > > > > > could
> > > > > > > be reset to 0. The admin won't realize that throttling has
> > > happened.
> > > > > > >
> > > > > > > For client quotas, the throttle time is tracked as the average
> > > > > > > throttle-time per user/client-id. This makes the metric less
> > > > transient.
> > > > > > >
> > > > > > > Also, the configs use read/write whereas the metrics use
> > > fetch/copy.
> > > > > > Could
> > > > > > > we make them consistent?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Wed, Feb 28, 2024 at 6:49 AM Abhijeet Kumar <
> > > > > > abhijeet.cse@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Clarified the meaning of the two metrics. Also updated the
> KIP.
> > > > > > > >
> > > > > > > > kafka.log.remote:type=RemoteLogManager,
> > > > name=RemoteFetchThrottleTime
> > > > > ->
> > > > > > > The
> > > > > > > > duration of time required at a given moment to bring the
> > observed
> > > > > fetch
> > > > > > > > rate within the allowed limit, by preventing further reads.
> > > > > > > > kafka.log.remote:type=RemoteLogManager,
> > > name=RemoteCopyThrottleTime
> > > > > ->
> > > > > > > The
> > > > > > > > duration of time required at a given moment to bring the
> > observed
> > > > > > remote
> > > > > > > > copy rate within the allowed limit, by preventing further
> > copies.
> > > > > > > >
> > > > > > > > Regards.
> > > > > > > >
> > > > > > > > On Wed, Feb 28, 2024 at 12:28 AM Jun Rao
> > >  > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Abhijeet,
> > > > > > > > >
> > > > > > > > > Thanks for the explanation. Makes sense to me now.
> > > > > > > > >
> > > > > > > > > Just a minor comment. Could you document the exact meaning
> of
> > > 

[jira] [Resolved] (KAFKA-16378) Under tiered storage, deleting local logs does not free disk space

2024-03-19 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen resolved KAFKA-16378.
--
Resolution: Fixed

> Under tiered storage, deleting local logs does not free disk space
> --
>
> Key: KAFKA-16378
> URL: https://issues.apache.org/jira/browse/KAFKA-16378
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-15-09-33-13-903.png
>
>
> Of course, this is an occasional phenomenon, as long as the tiered storage 
> topic triggered the deletion of the local log action, there is always the 
> possibility of residual file references, but these files on the local disk is 
> already impossible to find!
> I use the implementation as: [Aiven-Open/tiered-storage-for-apache-kafka: 
> RemoteStorageManager for Apache Kafka® Tiered Storage 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka]
> I also filed an issue in their community, which also contains a full 
> description of the problem
> [Disk space not released · Issue #513 · 
> Aiven-Open/tiered-storage-for-apache-kafka 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/issues/513]
> !image-2024-03-15-09-33-13-903.png!
> You can clearly see in this figure that the kafka log has already output the 
> log of the operation that deleted the log, but the log is still referenced 
> and the disk space has not been released



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached

2024-03-19 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16385:
-

 Summary: Segment is rolled before segment.ms or segment.bytes 
breached
 Key: KAFKA-16385
 URL: https://issues.apache.org/jira/browse/KAFKA-16385
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


Steps to reproduce:
1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec .
2. Send a record "aaa" to the topic
3. Wait for 1 second

Will this segment will rolled? I thought no.
But what I have tested is it will roll:

{code:java}
[2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, 
dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. 
(kafka.log.LocalLog)
[2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote 
producer snapshot at offset 1 with 1 producer ids in 1 ms. 
(org.apache.kafka.storage.internals.log.ProducerStateManager)
[2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, 
dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, 
lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to 
log retention time 1000ms breach based on the largest record timestamp in the 
segment (kafka.log.UnifiedLog)
{code}

The segment is rolled due to log retention time 1000ms breached, which is 
unexpected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[REVIEW REQUEST] ConsumerGroupCommand moved to tools

2024-03-19 Thread Николай Ижиков
Hello.

Thanks to previous patches we moved all tests of `ConsumerGroupCommand` to 
`tools` module.
One step left - move command itself.

PR ready for review - https://github.com/apache/kafka/pull/14471

Please, take a look.
Let’s make it happen.

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2731

2024-03-19 Thread Apache Jenkins Server
See