[jira] [Commented] (KAFKA-12506) Expand AdjustStreamThreadCountTest

2024-04-03 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833816#comment-17833816
 ] 

Arpit Goyal commented on KAFKA-12506:
-

[~kebab-mai-haddi] You can reassign this ticket back , if you want to work on 
this. 
[~ableegoldman] Can you help me assign some newbie tickets which would help me 
to start on kafka streams.

> Expand AdjustStreamThreadCountTest
> --
>
> Key: KAFKA-12506
> URL: https://issues.apache.org/jira/browse/KAFKA-12506
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: A. Sophie Blee-Goldman
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: newbie, newbie++
>
> Right now the AdjustStreamThreadCountTest runs a minimal topology that just 
> consumes a single input topic, and doesn't produce any data to this topic. 
> Some of the complex concurrency bugs that we've found only showed up when we 
> had some actual data to process and a stateful topology: 
> [KAFKA-12503|https://issues.apache.org/jira/browse/KAFKA-12503] KAFKA-12500
> See the umbrella ticket for the list of improvements we need to make this a 
> more effective integration test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-12506) Expand AdjustStreamThreadCountTest

2024-03-17 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827837#comment-17827837
 ] 

Arpit Goyal commented on KAFKA-12506:
-

[~ableegoldman] Is this ticket still open to work upon ? I want to pick it up  
and start learning KStreams.

> Expand AdjustStreamThreadCountTest
> --
>
> Key: KAFKA-12506
> URL: https://issues.apache.org/jira/browse/KAFKA-12506
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: A. Sophie Blee-Goldman
>Assignee: Aviral Srivastava
>Priority: Major
>  Labels: newbie, newbie++
>
> Right now the AdjustStreamThreadCountTest runs a minimal topology that just 
> consumes a single input topic, and doesn't produce any data to this topic. 
> Some of the complex concurrency bugs that we've found only showed up when we 
> had some actual data to process and a stateful topology: 
> [KAFKA-12503|https://issues.apache.org/jira/browse/KAFKA-12503] KAFKA-12500
> See the umbrella ticket for the list of improvements we need to make this a 
> more effective integration test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16209) fetchSnapshot might return null if topic is created before v2.8

2024-01-29 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812139#comment-17812139
 ] 

Arpit Goyal commented on KAFKA-16209:
-

[~showuon] I am picking it up. 

> fetchSnapshot might return null if topic is created before v2.8
> ---
>
> Key: KAFKA-16209
> URL: https://issues.apache.org/jira/browse/KAFKA-16209
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Luke Chen
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: newbie, newbie++
>
> Remote log manager will fetch snapshot via ProducerStateManager 
> [here|https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerStateManager.java#L608],
>  but the snapshot map might get nothing if the topic has no snapshot created, 
> ex: topics before v2.8. Need to fix it to avoid NPE.
> old PR: https://github.com/apache/kafka/pull/14615/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16209) fetchSnapshot might return null if topic is created before v2.8

2024-01-29 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-16209:
---

Assignee: Arpit Goyal

> fetchSnapshot might return null if topic is created before v2.8
> ---
>
> Key: KAFKA-16209
> URL: https://issues.apache.org/jira/browse/KAFKA-16209
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Luke Chen
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: newbie, newbie++
>
> Remote log manager will fetch snapshot via ProducerStateManager 
> [here|https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerStateManager.java#L608],
>  but the snapshot map might get nothing if the topic has no snapshot created, 
> ex: topics before v2.8. Need to fix it to avoid NPE.
> old PR: https://github.com/apache/kafka/pull/14615/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.

2024-01-28 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811643#comment-17811643
 ] 

Arpit Goyal edited comment on KAFKA-16088 at 1/28/24 9:51 AM:
--

[~satish.duggana] [~divijvaidya] [~ckamal] I am going to start working on it. 
As per my code understanding next fetch offset will be incremented only when 
there is some set of records in the previous response. 

In the case mentioned above, I was thinking  to introduce the delayedFetch for 
next higher offset within the DelayedRemoteFetch instead of calling the 
callback when number of records is empty. Let me know if there are other 
possible option to handle the above scenario. 




was (Author: JIRAUSER301926):
[~satish.duggana] [~divijvaidya] [~ckamal] I am going to start working on it. 
As per my code understanding next fetch offset will be incremented only when 
there is some set of records in the response. 

In the case mentioned above, I was thinking  to introduce the delayedFetch for 
next higher offset within the DelayedRemoteFetch instead of calling the 
callback when number of records is empty. Let me know if there are other 
possible option to handle the above scenario. 



>  Not reading active segments  when RemoteFetch return Empty Records.
> 
>
> Key: KAFKA-16088
> URL: https://issues.apache.org/jira/browse/KAFKA-16088
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Critical
>  Labels: tiered-storage
>
> This issue is about covering local log segments also while finding the 
> segment for a specific offset when the topic is compacted earlier but it is 
> changed to retention and enabled with tiered storage.
> Please refer to this comment for details 
> https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.

2024-01-28 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811643#comment-17811643
 ] 

Arpit Goyal edited comment on KAFKA-16088 at 1/28/24 9:51 AM:
--

[~satish.duggana] [~divijvaidya] [~ckamal] I am going to start working on it. 
As per my code understanding next fetch offset will be incremented only when 
there is some set of records in the response. 

In the case mentioned above, I was thinking  to introduce the delayedFetch for 
next higher offset within the DelayedRemoteFetch instead of calling the 
callback when number of records is empty. Let me know if there are other 
possible option to handle the above scenario. 




was (Author: JIRAUSER301926):
[~satish.duggana] [~divijvaidya] [~ckamal] I am going to start working on it. 
As per my code understanding next offset will be incremented only when there is 
some set of records in the response. 

In the case mentioned above, I was thinking  to introduce the delayedFetch for 
next higher offset within the DelayedRemoteFetch instead of calling the 
callback when number of records is empty. Let me know if there are other 
possible option to handle the above scenario. 



>  Not reading active segments  when RemoteFetch return Empty Records.
> 
>
> Key: KAFKA-16088
> URL: https://issues.apache.org/jira/browse/KAFKA-16088
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Critical
>  Labels: tiered-storage
>
> This issue is about covering local log segments also while finding the 
> segment for a specific offset when the topic is compacted earlier but it is 
> changed to retention and enabled with tiered storage.
> Please refer to this comment for details 
> https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.

2024-01-28 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811643#comment-17811643
 ] 

Arpit Goyal commented on KAFKA-16088:
-

[~satish.duggana] [~divijvaidya] [~ckamal] I am going to start working on it. 
As per my code understanding next offset will be incremented only when there is 
some set of records in the response. 

In the case mentioned above, I was thinking  to introduce the delayedFetch for 
next higher offset within the DelayedRemoteFetch instead of calling the 
callback when number of records is empty. Let me know if there are other 
possible option to handle the above scenario. 



>  Not reading active segments  when RemoteFetch return Empty Records.
> 
>
> Key: KAFKA-16088
> URL: https://issues.apache.org/jira/browse/KAFKA-16088
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Critical
>  Labels: tiered-storage
>
> This issue is about covering local log segments also while finding the 
> segment for a specific offset when the topic is compacted earlier but it is 
> changed to retention and enabled with tiered storage.
> Please refer to this comment for details 
> https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.

2024-01-27 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-16088:
---

Assignee: Arpit Goyal

>  Not reading active segments  when RemoteFetch return Empty Records.
> 
>
> Key: KAFKA-16088
> URL: https://issues.apache.org/jira/browse/KAFKA-16088
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Critical
>  Labels: tiered-storage
>
> This issue is about covering local log segments also while finding the 
> segment for a specific offset when the topic is compacted earlier but it is 
> changed to retention and enabled with tiered storage.
> Please refer to this comment for details 
> https://github.com/apache/kafka/pull/15060#pullrequestreview-1802495064



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.

2024-01-06 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803914#comment-17803914
 ] 

Arpit Goyal commented on KAFKA-16088:
-

cc [~divijvaidya] [~satish.duggana][~christo_lolov] [~Kamal C]

>  Not reading active segments  when RemoteFetch return Empty Records.
> 
>
> Key: KAFKA-16088
> URL: https://issues.apache.org/jira/browse/KAFKA-16088
> Project: Kafka
>  Issue Type: Bug
>Reporter: Arpit Goyal
>Priority: Critical
>
> Please refer this comment for details 
> https://github.com/apache/kafka/pull/15060#issuecomment-1879657273



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16088) Not reading active segments when RemoteFetch return Empty Records.

2024-01-06 Thread Arpit Goyal (Jira)
Arpit Goyal created KAFKA-16088:
---

 Summary:  Not reading active segments  when RemoteFetch return 
Empty Records.
 Key: KAFKA-16088
 URL: https://issues.apache.org/jira/browse/KAFKA-16088
 Project: Kafka
  Issue Type: Bug
Reporter: Arpit Goyal


Please refer this comment for details 
https://github.com/apache/kafka/pull/15060#issuecomment-1879657273



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2024-01-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801687#comment-17801687
 ] 

Arpit Goyal commented on KAFKA-16063:
-

[~divijvaidya] [~showuon] PR for review.
https://github.com/apache/kafka/pull/15104

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png, Screenshot 2024-01-01 at 10.51.03 PM-1.png, 
> Screenshot 2024-01-01 at 10.51.03 PM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2024-01-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801668#comment-17801668
 ] 

Arpit Goyal commented on KAFKA-16063:
-

[~divijvaidya] It seems to be a well known leak over the internet. 
https://blog.creekorful.org/2020/03/classloader-and-memory-leaks/
https://stackoverflow.com/questions/6385018/memory-leaks-with-addshutdownhook

The map will get cleared only when JVM shutdown happen,  during which this hook 
registered would get executed and cleared. 

{code:java}
 static void runHooks() {
Collection threads;
synchronized(ApplicationShutdownHooks.class) {
threads = hooks.keySet();
hooks = null;
}

for (Thread hook : threads) {
hook.start();
}
for (Thread hook : threads) {
while (true) {
try {
hook.join();
break;
} catch (InterruptedException ignored) {
}
}
}
}
{code}

We does not have the thread reference to  clear it manually. removeShutdownHook 
require thread reference which is anonymously created within the 
DefaultDirectoryService library code. 

{code:java}
Runtime.getRuntime.removeShutdownHook(THREAD)
{code}
The only feasible option as far the analysis is to disable the hook enabled 
flag. 


> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png, Screenshot 2024-01-01 at 10.51.03 PM-1.png, 
> Screenshot 2024-01-01 at 10.51.03 PM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2024-01-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801568#comment-17801568
 ] 

Arpit Goyal commented on KAFKA-16063:
-

[~divijvaidya] I am able to reproduce the issue locally and also figured out 
the reason of the leak. 

While initialize  the directory Service in MiniKDC.scala 
https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/security/minikdc/MiniKdc.scala#L181

DirectoryService looks for shutdownHookEnabled  flag which is enabled by 
default and  adds ApplicationShutdownHooks into the Identity HashMap.

{code:java}
 if ( shutdownHookEnabled )
{
Runtime.getRuntime().addShutdownHook( new Thread( new Runnable()
{
public void run()
{
try
{
shutdown();
}
catch ( Exception e )
{
LOG.warn( "Failed to shut down the directory service: "
+ DefaultDirectoryService.this.instanceId, e );
}
}
}, "ApacheDS Shutdown Hook (" + instanceId + ')' ) );

LOG.info( "ApacheDS shutdown hook has been registered with the 
runtime." );
}
{code}
But This Map is never cleared in the  DirectoryServiceShutdown method which we  
called during MiniKDC stop function. 

I think we can disable the  shutdownHookEnabled flag , as directory service 
shutdown method is already being called in the MiniKDC stop function.
{code:java}
ds.setShutdownHookEnabled(false)
{code}
Attached screenshot for the reference.  !Screenshot 2024-01-01 at 10.51.03 
PM.png! 



> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png, Screenshot 2024-01-01 at 10.51.03 PM-1.png, 
> Screenshot 2024-01-01 at 10.51.03 PM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2024-01-01 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-16063:

Attachment: Screenshot 2024-01-01 at 10.51.03 PM.png

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png, Screenshot 2024-01-01 at 10.51.03 PM-1.png, 
> Screenshot 2024-01-01 at 10.51.03 PM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2024-01-01 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-16063:

Attachment: Screenshot 2024-01-01 at 10.51.03 PM-1.png

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png, Screenshot 2024-01-01 at 10.51.03 PM-1.png, 
> Screenshot 2024-01-01 at 10.51.03 PM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-31 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801508#comment-17801508
 ] 

Arpit Goyal commented on KAFKA-16063:
-

Thanks [~divijvaidya]. I was attaching wrong process to the intellij. 

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-30 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801408#comment-17801408
 ] 

Arpit Goyal edited comment on KAFKA-16063 at 12/31/23 2:08 AM:
---

[~divijvaidya] I am trying to reproduce the issue using intellij profiler. I 
executed this command 


{code:java}
`./gradlew -PmaxParallelForks=1 -PmaxScalacThreads=1 :core:test` 
{code}

But I didn't see any heap memory rising up. I executed  it completely for 
around 7-8 hours. Am i doing something wrong ?
Please find the screenshot attached. 






was (Author: JIRAUSER301926):
[~divijvaidya] I am trying to reproduce the issue using intellij profiler. I 
executed this command 


{code:java}
`./gradlew -PmaxParallelForks=1 -PmaxScalacThreads=1 :core:test` 
{code}

But I didn't see any heap memory rising up. I executed  it completely for 
around 7-8 hours. Am i doing something wrong ?
Please find the screenshot attached.  !Screenshot 2023-12-31 at 7.19.25 AM.png! 



> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png, Screenshot 
> 2023-12-31 at 7.19.25 AM.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-30 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801408#comment-17801408
 ] 

Arpit Goyal edited comment on KAFKA-16063 at 12/31/23 2:08 AM:
---

[~divijvaidya] I am trying to reproduce the issue using intellij profiler. I 
executed this command 


{code:java}
`./gradlew -PmaxParallelForks=1 -PmaxScalacThreads=1 :core:test` 
{code}

But I didn't see any heap memory rising up. I executed  it completely for 
around 7-8 hours. Am i doing something wrong ?
Please find the screenshot attached.  !Screenshot 2023-12-31 at 7.19.25 AM.png! 




was (Author: JIRAUSER301926):
[~divijvaidya] I am trying to reproduce the issue using intellij profiler. I 
executed this command 


{code:java}
`./gradlew -PmaxParallelForks=1 -PmaxScalacThreads=1 :core:test` 
{code}

But I didn't see any heap memory rising up. I executed  it completely for 
around 7-8 hours. Am i doing something wrong ?
Please find the screenshot attached.  !Screenshot 2023-12-31 at 7.19.25 AM.png! 



> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-30 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801408#comment-17801408
 ] 

Arpit Goyal commented on KAFKA-16063:
-

[~divijvaidya] I am trying to reproduce the issue using intellij profiler. I 
executed this command 


{code:java}
`./gradlew -PmaxParallelForks=1 -PmaxScalacThreads=1 :core:test` 
{code}

But I didn't see any heap memory rising up. I executed  it completely for 
around 7-8 hours. Am i doing something wrong ?
Please find the screenshot attached.  !Screenshot 2023-12-31 at 7.19.25 AM.png! 



> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-29 Thread Arpit Goyal (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-16063 ]


Arpit Goyal deleted comment on KAFKA-16063:
-

was (Author: JIRAUSER301926):
[~divijvaidya] It seems intellij profiler is available in the ultimate edition 
and not the community edition.

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-29 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801295#comment-17801295
 ] 

Arpit Goyal commented on KAFKA-16063:
-

[~divijvaidya] It seems intellij profiler is available in the ultimate edition 
and not the community edition.

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-29 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801156#comment-17801156
 ] 

Arpit Goyal commented on KAFKA-16063:
-

Thanks [~divijvaidya]. I am picking it up. 

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests

2023-12-29 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-16063:
---

Assignee: Arpit Goyal

> Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
> -
>
> Key: KAFKA-16063
> URL: https://issues.apache.org/jira/browse/KAFKA-16063
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-12-29 at 12.38.29.png
>
>
> All test extending `EndToEndAuthorizationTest` are leaking 
> DefaultDirectoryService objects.
> This can be observed using the heap dump at 
> [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0]
>  (unzip this and you will find a hprof which can be opened with your 
> favourite heap analyzer)
> The stack trace looks like this:
> !Screenshot 2023-12-29 at 12.38.29.png!
>  
> I suspect that the reason is because DefaultDirectoryService#startup() 
> registers a shutdownhook which is somehow messed up by 
> QuorumTestHarness#teardown().
> We need to investigate why this is leaking and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test

2023-12-28 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801106#comment-17801106
 ] 

Arpit Goyal commented on KAFKA-16059:
-

[~divijvaidya] Which tool or command you are using to generate and analyze 
heapdump profile. Can you share the command ? 

> Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
> --
>
> Key: KAFKA-16059
> URL: https://issues.apache.org/jira/browse/KAFKA-16059
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
>
> We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the 
> tests in :core:test
> {code:java}
> "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition
>      java.lang.Thread.State: TIMED_WAITING
>  on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67
>     at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
>     at 
> java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
>     at 
> java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
>     at 
> java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265)
>     at 
> app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87)
>     at 
> app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418)
>     at 
> app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444)
>     at 
> app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)
>  {code}
> The objective of this Jira is to identify the test and fix this leak



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test

2023-12-28 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-16059:
---

Assignee: Arpit Goyal

> Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
> --
>
> Key: KAFKA-16059
> URL: https://issues.apache.org/jira/browse/KAFKA-16059
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Major
>
> We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the 
> tests in :core:test
> {code:java}
> "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition
>      java.lang.Thread.State: TIMED_WAITING
>  on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67
>     at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
>     at 
> java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
>     at 
> java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
>     at 
> java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265)
>     at 
> app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87)
>     at 
> app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418)
>     at 
> app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444)
>     at 
> app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)
>  {code}
> The objective of this Jira is to identify the test and fix this leak



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15529) Flaky test ReassignReplicaShrinkTest.executeTieredStorageTest

2023-12-25 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800284#comment-17800284
 ] 

Arpit Goyal commented on KAFKA-15529:
-

[~divijvaidya] [~Kamal C] Any idea in which case it is possible to have the 
above failed scenario. I can try reproducing locally. 

> Flaky test ReassignReplicaShrinkTest.executeTieredStorageTest
> -
>
> Key: KAFKA-15529
> URL: https://issues.apache.org/jira/browse/KAFKA-15529
> Project: Kafka
>  Issue Type: Test
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.7.0
>
>
> Example of failed CI build - 
> [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14449/3/testReport/junit/org.apache.kafka.tiered.storage.integration/ReassignReplicaShrinkTest/Build___JDK_21_and_Scala_2_13___executeTieredStorageTest_String__quorum_kraft_2/]
>   
> {noformat}
> org.opentest4j.AssertionFailedError: Number of fetch requests from broker 0 
> to the tier storage does not match the expected value for topic-partition 
> topicA-1 ==> expected: <3> but was: <4>
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at 
> app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:559)
>   at 
> app//org.apache.kafka.tiered.storage.actions.ConsumeAction.doExecute(ConsumeAction.java:128)
>   at 
> app//org.apache.kafka.tiered.storage.TieredStorageTestAction.execute(TieredStorageTestAction.java:25)
>   at 
> app//org.apache.kafka.tiered.storage.TieredStorageTestHarness.executeTieredStorageTest(TieredStorageTestHarness.java:112){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15301) [Tiered Storage] Historically compacted topics send request to remote for active segment during consume

2023-12-22 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799742#comment-17799742
 ] 

Arpit Goyal commented on KAFKA-15301:
-

[~mital.awachat] does this ticket addresses your concern.
https://issues.apache.org/jira/browse/KAFKA-15388

> [Tiered Storage] Historically compacted topics send request to remote for 
> active segment during consume
> ---
>
> Key: KAFKA-15301
> URL: https://issues.apache.org/jira/browse/KAFKA-15301
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.6.0
>Reporter: Mital Awachat
>Assignee: Jimmy Wang
>Priority: Major
> Fix For: 3.7.0
>
>
> I have a use case where tiered storage plugin received requests for active 
> segments. The topics for which it happened were historically compacted topics 
> for which compaction was disabled and tiering was enabled.
> Create topic with compact cleanup policy -> Produce data with few repeat keys 
> and create multiple segments -> let compaction happen -> change cleanup 
> policy to delete -> produce some more data for segment rollover -> enable 
> tiering on topic -> wait for segments to be uploaded to remote storage and 
> cleanup from local (active segment would remain), consume from beginning -> 
> Observe logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-12-21 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799532#comment-17799532
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] [~satish.duggana] [~showuon] Pr for review 
https://github.com/apache/kafka/pull/15060

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-12-21 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799399#comment-17799399
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 12/21/23 12:50 PM:


[~enether] [~satish.duggana] [~divijvaidya] [~enether] I am currently working 
on the fix. Hopefully will create a PR by EOD.


was (Author: JIRAUSER301926):
[~enether] [~satish.duggana] [~divijvaidya] [~enether] I am currently working 
on the fix. Hopefully will create a PR by Saturday.

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-12-21 Thread Arpit Goyal (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-15388 ]


Arpit Goyal deleted comment on KAFKA-15388:
-

was (Author: JIRAUSER301926):
[~showuon] [~satish.duggana] [~christo_lolov] Any suggestion how to fetch the 
higher segment for a particular offset from remote , the same way we do in 
local storage ?

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-12-21 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799399#comment-17799399
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~enether] [~satish.duggana] [~divijvaidya] [~enether] I am currently working 
on the fix. Hopefully will create a PR by Saturday.

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-12-14 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796737#comment-17796737
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~showuon] [~satish.duggana] [~christo_lolov] Any suggestion how to fetch the 
higher segment for a particular offset from remote , the same way we do in 
local storage ?

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-21 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788496#comment-17788496
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] [~satish.duggana] [~showuon] Any suggestion on how to fetch 
remotelogsegment metadata of a segment with the smallest offset strictly 
greater than the given offset. As of now I see the only option to list all 
remotelogsegment metadata and filter it out.

Should we introduce another method  in RemoteLogMetadataManager to fetch higher 
segment for a given offset and leader epoch ?

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786299#comment-17786299
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 11/15/23 10:38 AM:


Hey [~divijvaidya] I am successfully able to reproduce the issue locally. 
*Configuration* 
1. Create Topic name - test20 partition 0 
2. add segment.bytes as 100 for quicker roll of segments 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config segment.bytes=100
{code}
3. Enable clean up compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config cleanup.policy=compact
{code} 
4. *Produce some messages *

{code:java}
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test20 
--property "parse.key=true" --property "key.separator=:"
>1:1
>1:2
>1:3
>1:4
>1:5
>1:6
>1:7
>1:8
>1:9
>1:10
>1:11
>1:12
>1:13
>1:14
>1:15
>1:16
{code}
5. When we try to consume message from the topic without remote storage enable 
{code:java}
(base) ➜  kafka git:(trunk) ✗ bin/kafka-console-consumer.sh  --bootstrap-server 
localhost:9092 --topic test20  --offset 0 --partition 0 --property 
print.offset=true
Offset:14   15
Offset:15   16
{code}
6. Disable the compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --delete-config cleanup.policy
{code}
7. Enable remote storage for the required topic 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config remote.storage.enable=true
{code}
8. The rolled over segments has been moved to tiered storage topic. Attached 
screenshot for the reference. (tieredtopicconfiglist)
9. While executing consumer.sh for the topic , it return nothing. 
{code:java}
bin/kafka-console-consumer.sh  --bootstrap-server localhost:9092 --topic test20 
 --offset 0 --partition 0 --property print.offset=true
{code}

10. As it was a historically compacted topic with only key 1 , most of the log 
segments had zero bytes.Please refer (tieredtopicconfiglist) screenshot.Only 
14.log and 15.log has some data. 
11. When we try fetching it from the remote storage , it looks for the first 
batch assuming it would always exist , but in this case 0.log has zero bytes 
and  batch value would be  null because segment is empty.
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileLogInputStream.java#L65((
  0 >= 0 -11))
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1411
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1277

12. This seems to be a critical bug , and it should be inline with the local 
log fetch mechanism , ie if batch is null , it should fetch the higher segment 
and continue till the end offset. 

Should i create a different bug for this ?  The functionality is broken here.








was (Author: JIRAUSER301926):
Hey [~divijvaidya] I am successfully able to reproduce the issue locally. 
*Configuration* 
1. Create Topic name - test20 partition 0 
2. add segment.bytes as 100 for quicker roll of segments 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config segment.bytes=100
{code}
3. Enable clean up compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config cleanup.policy=compact
{code} 
4. *Produce some messages *

{code:java}
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test20 
--property "parse.key=true" --property "key.separator=:"
>1:1
>1:2
>1:3
>1:4
>1:5
>1:6
>1:7
>1:8
>1:9
>1:10
>1:11
>1:12
>1:13
>1:14
>1:15
>1:16
{code}
5. When we try to consume message from the topic without remote storage enable 
{code:java}
(base) ➜  kafka git:(trunk) ✗ bin/kafka-console-consumer.sh  --bootstrap-server 
localhost:9092 --topic test20  --offset 0 --partition 0 --property 
print.offset=true
Offset:14   15
Offset:15   16
{code}
6. Disable the compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --delete-config cleanup.policy
{code}
7. Enable remote storage for the required topic 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config remote.storage.enable=true
{code}
8. The rolled over segments has been moved to tiered storage topic. Attached 
screenshot for the reference. (tieredtopicconfiglist)
9. As it was a historically compacted topic with only key 1 , most of the log 
segments had zero 

[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786299#comment-17786299
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 11/15/23 10:37 AM:


Hey [~divijvaidya] I am successfully able to reproduce the issue locally. 
*Configuration* 
1. Create Topic name - test20 partition 0 
2. add segment.bytes as 100 for quicker roll of segments 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config segment.bytes=100
{code}
3. Enable clean up compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config cleanup.policy=compact
{code} 
4. *Produce some messages *

{code:java}
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test20 
--property "parse.key=true" --property "key.separator=:"
>1:1
>1:2
>1:3
>1:4
>1:5
>1:6
>1:7
>1:8
>1:9
>1:10
>1:11
>1:12
>1:13
>1:14
>1:15
>1:16
{code}
5. When we try to consume message from the topic without remote storage enable 
{code:java}
(base) ➜  kafka git:(trunk) ✗ bin/kafka-console-consumer.sh  --bootstrap-server 
localhost:9092 --topic test20  --offset 0 --partition 0 --property 
print.offset=true
Offset:14   15
Offset:15   16
{code}
6. Disable the compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --delete-config cleanup.policy
{code}
7. Enable remote storage for the required topic 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config remote.storage.enable=true
{code}
8. The rolled over segments has been moved to tiered storage topic. Attached 
screenshot for the reference. (tieredtopicconfiglist)
9. As it was a historically compacted topic with only key 1 , most of the log 
segments had zero bytes.Please refer (tieredtopicconfiglist) screenshot.Only 
14.log and 15.log has some data. 
10. When we try fetching it from the remote storage , it looks for the first 
batch assuming it would always exist , but in this case 0.log has zero bytes 
and  batch value would be  null because segment is empty.
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileLogInputStream.java#L65((
  0 >= 0 -11))
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1411
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1277

11. This seems to be a critical bug , and it should be inline with the local 
log fetch mechanism , ie if batch is null , it should fetch the higher segment 
and continue till the end offset. 

Should i create a different bug for this ?  The functionality is broken here.








was (Author: JIRAUSER301926):
Hey [~divijvaidya] I am successfully able to reproduce the issue locally. 
*Configuration* 
1. Create Topic name - test20 partition 0 
2. add segment.bytes as 100 for quicker roll of segments 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config segment.bytes=100
{code}
3. Enable clean up compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config cleanup.policy=compact
{code} 
4. *Produce some messages *

{code:java}
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test20 
--property "parse.key=true" --property "key.separator=:"
>1:1
>1:2
>1:3
>1:4
>1:5
>1:6
>1:7
>1:8
>1:9
>1:10
>1:11
>1:12
>1:13
>1:14
>1:15
>1:16
{code}
5. When we try to consume message from the topic without remote storage enable 
{code:java}
(base) ➜  kafka git:(trunk) ✗ bin/kafka-console-consumer.sh  --bootstrap-server 
localhost:9092 --topic test20  --offset 0 --partition 0 --property 
print.offset=true
Offset:14   15
Offset:15   16
{code}
6. Disable the compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --delete-config cleanup.policy
{code}
7. Enable remote storage for the required topic 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config remote.storage.enable=true
{code}
8. The rolled over segments has been moved to tiered storage topic. Attached 
screenshot for the reference. (tieredtopicconfiglist)
9. As it was a historically compacted topic with only key 1 , most of the log 
segments had zero bytes.Please refer (tieredtopicconfiglist) screenshot.Only 
14.log and 15.log has some data. 
10. When we try fetching it from the remote storage , it looks for the first 
batch assuming it would always exist , but in this case 

[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786299#comment-17786299
 ] 

Arpit Goyal commented on KAFKA-15388:
-

Hey [~divijvaidya] I am successfully able to reproduce the issue locally. 
*Configuration* 
1. Create Topic name - test20 partition 0 
2. add segment.bytes as 100 for quicker roll of segments 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config segment.bytes=100
{code}
3. Enable clean up compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config cleanup.policy=compact
{code} 
4. *Produce some messages *

{code:java}
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test20 
--property "parse.key=true" --property "key.separator=:"
>1:1
>1:2
>1:3
>1:4
>1:5
>1:6
>1:7
>1:8
>1:9
>1:10
>1:11
>1:12
>1:13
>1:14
>1:15
>1:16
{code}
5. When we try to consume message from the topic without remote storage enable 
{code:java}
(base) ➜  kafka git:(trunk) ✗ bin/kafka-console-consumer.sh  --bootstrap-server 
localhost:9092 --topic test20  --offset 0 --partition 0 --property 
print.offset=true
Offset:14   15
Offset:15   16
{code}
6. Disable the compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --delete-config cleanup.policy
{code}
7. Enable remote storage for the required topic 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config remote.storage.enable=true
{code}
8. The rolled over segments has been moved to tiered storage topic. Attached 
screenshot for the reference. (tieredtopicconfiglist)
9. As it was a historically compacted topic with only key 1 , most of the log 
segments had zero bytes.Please refer (tieredtopicconfiglist) screenshot.Only 
14.log and 15.log has some data. 
10. When we try fetching it from the remote storage , it looks for the first 
batch assuming it would always exist , but in this case batch always return 
null because segment is empty and batch would always be null (  0 >= 0 -11)
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileLogInputStream.java#L65
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1411
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1277

11. This seems to be a critical bug , and it should be inline with the local 
log fetch mechanism , ie if segment is null , it should fetch the higher 
segment and continue till the end offset. 

Should i create a different bug for this ? 







> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-15388:

Attachment: tieredtopicloglist.png

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-15388:

Attachment: (was: Screenshot 2023-11-15 at 3.53.43 PM.png)

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-15388:

Attachment: Screenshot 2023-11-15 at 3.53.43 PM.png

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, Screenshot 
> 2023-11-15 at 3.53.43 PM.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-15 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-15388:

Attachment: Screenshot 2023-11-15 at 3.47.54 PM.png

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-04 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782907#comment-17782907
 ] 

Arpit Goyal commented on KAFKA-15388:
-

I  just found the answer for the above query 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/clients/src/main/java/org/apache/kafka/common/record/RecordBatch.java#L118
Last offset of record batch will always be equal to the value before 
compaction. 

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-04 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782860#comment-17782860
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 11/4/23 11:45 AM:
---

[~divijvaidya] [~satish.duggana] [~christo_lolov] [~showuon] can anyone help me 
with the above query ?  
This would help me to proceed further. 
*Summary* 
In a given segment we are trying to find the right record batch for the 
requested offset , but it may be possible the end offset of a given batch is 
already compacted  , for example 
Lets say we ar looking to fetch data for offset 50 
A segments contain record batch with start and end offset in the following 
format, but 50th offset is historically compacted. 
RB1[33-38]
RB2[42-49]
RB3[51-56]
Now if we try to fetch data  for 50th offset  it would return null. Is it 
something expected or a bug ?  If it is how would we identify which record 
batch needs to be returned ?
https://github.com/satishd/kafka/blob/46c96f4868d51c84b43003bbb80bc07297016912/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1339



was (Author: JIRAUSER301926):
[~divijvaidya] [~satish.duggana] [~christo_lolov] [~showuon] can anyone help me 
with the above query ?  
This would help me to proceed further. 
To summarize it 
In a given segment we are trying to find the right record batch for the 
requested offset , but it may be possible the end offset of a given batch is 
already compacted  , for example 
Lets say we ar looking to fetch data for offset 50 
A segments contain record batch with start and end offset in the following 
format, but 50th offset is historically compacted. 
RB1[33-38]
RB2[42-49]
RB3[51-56]
Now if we try to fetch data  for 50th offset  it would return null. Is it 
something expected or a bug ?  If it is how would we identify which record 
batch needs to be returned ?
https://github.com/satishd/kafka/blob/46c96f4868d51c84b43003bbb80bc07297016912/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1339


> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-04 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782860#comment-17782860
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 11/4/23 11:45 AM:
---

[~divijvaidya] [~satish.duggana] [~christo_lolov] [~showuon] can anyone help me 
with the above query ?  
This would help me to proceed further. 
To summarize it 
In a given segment we are trying to find the right record batch for the 
requested offset , but it may be possible the end offset of a given batch is 
already compacted  , for example 
Lets say we ar looking to fetch data for offset 50 
A segments contain record batch with start and end offset in the following 
format, but 50th offset is historically compacted. 
RB1[33-38]
RB2[42-49]
RB3[51-56]
Now if we try to fetch data  for 50th offset  it would return null. Is it 
something expected or a bug ?  If it is how would we identify which record 
batch needs to be returned ?
https://github.com/satishd/kafka/blob/46c96f4868d51c84b43003bbb80bc07297016912/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1339



was (Author: JIRAUSER301926):
[~divijvaidya] [~satish.duggana] [~christo_lolov] [~showuon] can anyone help me 
with the above query ?  
This would help me to proceed further. 
To summarize it 
In a given segment we are trying to find the right record batch for the 
requested offset , but it may be possible the end offset of a given batch is 
already compacted  , for example 
Lets say we ar looking to fetch data for offset 50 
A segments contain record batch with start and end offset in the following 
format, but 50th offset is historically compacted. 
RB1[33-38]
RB2[42-49]
RB3[51-56]
Now if we try to fetch data  for 50th offset  it would return null. Is it 
something expected or a bug ?  If it is how would we identify which record 
batch needs to be returned ?


> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-04 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782860#comment-17782860
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] [~satish.duggana] [~christo_lolov] [~showuon] can anyone help me 
with the above query ?  
This would help me to proceed further. 
To summarize it 
In a given segment we are trying to find the right record batch for the 
requested offset , but it may be possible the end offset of a given batch is 
already compacted  , for example 
Lets say we ar looking to fetch data for offset 50 
A segments contain record batch with start and end offset in the following 
format, but 50th offset is historically compacted. 
RB1[33-38]
RB2[42-49]
RB3[51-56]
Now if we try to fetch data  for 50th offset  it would return null. Is it 
something expected or a bug ?  If it is how would we identify which record 
batch needs to be returned ?


> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-03 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782492#comment-17782492
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] [~christo_lolov] Need more help in understanding the use case 
here on this line 
https://github.com/satishd/kafka/blob/46c96f4868d51c84b43003bbb80bc07297016912/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1339

Let's say we are fetching the data for offset k 

1. We try to find leaderEpoch for the requested offset k 
2. Using leader epoch and offset k , we try to find out the corresponding 
RemoteLogSegmentMetadata 
3.  Using  RemoteLogSegmentMetadata  and offsetIndex we try to find the  
position of the  highest possible entry less than the requested offset k. 
4. Using the startposition fetched from 3rd step , we fetched 
remotesegInputStream  from RemoteStorageManager. 
Now here we try to find the right record batch where our offset  lies within 
the corresponding batch. 
But here IMP same use case arises. If it is a historically compacted topic and 
the record batch last offset  is a compacted one , then we should return the 
ideal batch instead of empty ?

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781751#comment-17781751
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 11/2/23 5:04 AM:
--

[~divijvaidya] I am not getting the archival functionality , From archival you 
mean once the segments has been expired because of retention mechanism ?

Do you mean If we takes care of the endoffset value while reading it from the 
remote storage i.e. read offset from the next segment base offset instead of 
manipulating using endoffset. I found two usages where we are using it 
1. One while  updating the logStartOffset  during cleanup of the log segment 
based on retention 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L827
2. During read path of the remote storage 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1263
We need to correct at both places ?



was (Author: JIRAUSER301926):
[~divijvaidya] I am not getting the archival functionality , From archival you 
mean copying segments would not be impacted. 

Do you mean If we takes care of the endoffset value while reading it from the 
remote storage i.e. read offset from the next segment base offset instead of 
manipulating using endoffset. I found two usages where we are using it 
1. One while  updating the logStartOffset  during cleanup of the log segment 
based on retention 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L827
2. During read path of the remote storage 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1263
We need to correct at both places ?


> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781751#comment-17781751
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] I am not getting the archival functionality , From archival you 
mean copying segments would not be impacted. 

Do you mean If we takes care of the endoffset value while reading it from the 
remote storage i.e. read offset from the next segment base offset instead of 
manipulating using endoffset. I found two usages where we are using it 
1. One while  updating the logStartOffset  during cleanup of the log segment 
based on retention 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L827
2. During read path of the remote storage 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1263
We need to correct at both places ?


> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781616#comment-17781616
 ] 

Arpit Goyal edited comment on KAFKA-15388 at 11/1/23 6:43 AM:
--

[~divijvaidya] [~satish.duggana] Correct me  here , We need to fix this also 
while executing copying segment 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L693
This also assumes endoffset of a given segment would be ( nextOffsetSegment - 
1), which can't be true if it is a historically compacted segment. 


was (Author: JIRAUSER301926):
[~divijvaidya] [~satish.duggana] Correct me  here , We need to fix this also 
while executing copying segment 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L693
This also assumes endoffset of a given segment would be ( nextOffsetSegment - 
1).

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-11-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781616#comment-17781616
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] [~satish.duggana] Correct me  here , We need to fix this also 
while executing copying segment 
https://github.com/apache/kafka/blob/5785796f985aa294c12e670da221d086a7fa887c/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L693
This also assumes endoffset of a given segment would be ( nextOffsetSegment - 
1).

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-10-25 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779623#comment-17779623
 ] 

Arpit Goyal commented on KAFKA-15388:
-

Thanks [~divijvaidya] for providing the initial steps  required to work on 
this. I will go through it.Do you remember any KIP created in past related to 
compacting feature ?

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-10-23 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778884#comment-17778884
 ] 

Arpit Goyal commented on KAFKA-15388:
-

[~divijvaidya] Sure I am picking it up.

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

2023-10-23 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-15388:
---

Assignee: Arpit Goyal

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> 
>
> Key: KAFKA-15388
> URL: https://issues.apache.org/jira/browse/KAFKA-15388
> Project: Kafka
>  Issue Type: Bug
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Blocker
> Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-10-04 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771713#comment-17771713
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya] [~satish.duggana] [~showuon] Added few test cases  Please review 
the PR 
https://github.com/apache/kafka/pull/14482/files

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15481) Concurrency bug in RemoteIndexCache leads to IOException

2023-10-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770830#comment-17770830
 ] 

Arpit Goyal edited comment on KAFKA-15481 at 10/1/23 3:27 PM:
--

[~divijvaidya]  [~showuon] 
1\ we want to ensure here is that removing the entry from cache and renaming 
the file to add "delete" suffix should be atomic operations OR
2\ we can add a UUID at the end of each index file downloaded from remote for 
this cache so that each "entry" in the cache has a unique file associated with 
it.
Should we go with the first or the 2nd approach , IMO 2nd would decouple the 
flow and we don't have to worry about any other concurrency issue and  would be 
simple to understand ?


was (Author: JIRAUSER301926):
[~divijvaidya]  [~showuon] 
1\ we want to ensure here is that removing the entry from cache and renaming 
the file to add "delete" suffix should be atomic operations OR
2\ we can add a UUID at the end of each index file downloaded from remote for 
this cache so that each "entry" in the cache has a unique file associated with 
it.
Should we go with the first or the 2nd approach , IMO 2nd would decouple the 
flow and would be simple to understand ?

> Concurrency bug in RemoteIndexCache leads to IOException
> 
>
> Key: KAFKA-15481
> URL: https://issues.apache.org/jira/browse/KAFKA-15481
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.7.0
>
>
> RemoteIndexCache has a concurrency bug which leads to IOException while 
> fetching data from remote tier.
> Below events in order of timeline -
> Thread 1 (cache thread): invalidates the entry, removalListener is invoked 
> async, so the files have not been renamed to "deleted" suffix yet.
> Thread 2: (fetch thread): tries to find entry in cache, doesn't find it 
> because it has been removed by 1, fetches the entry from S3, writes it to 
> existing file (using replace existing)
> Thread 1: async removalListener is invoked, acquires a lock on old entry 
> (which has been removed from cache), it renames the file to "deleted" and 
> starts deleting it
> Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file 
> and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM 
> returns an error as it won't allow creation of 2GB random access file.
> *Potential Fix*
> Use EvictionListener instead of RemovalListener in Caffeine cache as per the 
> documentation:
> {quote} When the operation must be performed synchronously with eviction, use 
> {{Caffeine.evictionListener(RemovalListener)}} instead. This listener will 
> only be notified when {{RemovalCause.wasEvicted()}} is true. For an explicit 
> removal, {{Cache.asMap()}} offers compute methods that are performed 
> atomically.{quote}
> This will ensure that removal from cache and marking the file with delete 
> suffix is synchronously done, hence the above race condition will not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15481) Concurrency bug in RemoteIndexCache leads to IOException

2023-10-01 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770830#comment-17770830
 ] 

Arpit Goyal commented on KAFKA-15481:
-

[~divijvaidya]  [~showuon] 
1\ we want to ensure here is that removing the entry from cache and renaming 
the file to add "delete" suffix should be atomic operations OR
2\ we can add a UUID at the end of each index file downloaded from remote for 
this cache so that each "entry" in the cache has a unique file associated with 
it.
Should we go with the first or the 2nd approach , IMO 2nd would decouple the 
flow and would be simple to understand ?

> Concurrency bug in RemoteIndexCache leads to IOException
> 
>
> Key: KAFKA-15481
> URL: https://issues.apache.org/jira/browse/KAFKA-15481
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.7.0
>
>
> RemoteIndexCache has a concurrency bug which leads to IOException while 
> fetching data from remote tier.
> Below events in order of timeline -
> Thread 1 (cache thread): invalidates the entry, removalListener is invoked 
> async, so the files have not been renamed to "deleted" suffix yet.
> Thread 2: (fetch thread): tries to find entry in cache, doesn't find it 
> because it has been removed by 1, fetches the entry from S3, writes it to 
> existing file (using replace existing)
> Thread 1: async removalListener is invoked, acquires a lock on old entry 
> (which has been removed from cache), it renames the file to "deleted" and 
> starts deleting it
> Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file 
> and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM 
> returns an error as it won't allow creation of 2GB random access file.
> *Potential Fix*
> Use EvictionListener instead of RemovalListener in Caffeine cache as per the 
> documentation:
> {quote} When the operation must be performed synchronously with eviction, use 
> {{Caffeine.evictionListener(RemovalListener)}} instead. This listener will 
> only be notified when {{RemovalCause.wasEvicted()}} is true. For an explicit 
> removal, {{Cache.asMap()}} offers compute methods that are performed 
> atomically.{quote}
> This will ensure that removal from cache and marking the file with delete 
> suffix is synchronously done, hence the above race condition will not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-29 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770491#comment-17770491
 ] 

Arpit Goyal commented on KAFKA-15169:
-

> Another one that comes to my mind is if we have a 
> ".delete" file on the disk (because files with delete 
> suffix are removed async) and another one with the same name is created (it 
> would occur second time the entry with same key gets invalidated). It should 
> be a no-op in that case (instead of throwing an error).

[~divijvaidya] This should be similar to what you mentioned long back 
>Another test:
1. Create an index and verify file exists
2. Invalidate the index from cache which will mark it for cleanup i.e. change 
suffix to .deleted
3. Try to fetch the same index from cache, cache will not have it and re-fetch 
it.
4. Invalidate this index.
5. If the cleaner thread is slow, the file in step 2 will still exist
6. When we try to move this new file at step 4 to the same name with deleted, 
verify that it doesn't fail and instead overwrites the file.

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15511) Exception not handled correctly if indexFile is corrupted.

2023-09-27 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769525#comment-17769525
 ] 

Arpit Goyal commented on KAFKA-15511:
-

[~divijvaidya] Can you review this 
https://github.com/apache/kafka/pull/14459/files


>  Exception not handled correctly if indexFile is corrupted.
> ---
>
> Key: KAFKA-15511
> URL: https://issues.apache.org/jira/browse/KAFKA-15511
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Major
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-09-27 at 1.01.58 PM.png, Screenshot 
> 2023-09-27 at 1.14.22 PM.png
>
>
> I was simulating  a code flow where  there is a possibility of inconsistency 
> between RemoteCache and fileStored on disk.
> 1. OffsetIndex Corrupt File already exist on disk 
> 2. There is no entry exist in the RemoteIndexCache 
> 3.  Call getIndexEntry
> 4. As File already exists on the disk and corrupted , indexSanityCheck  will 
> throws "*CorruptIndexException*"
> 5. But the code flow in RemoteIndexCache  catches only 
> *"CorruptRecordException".*
> Ideally it should catch *CorruptIndexException* instead of  
> *CorruptRecordException*.
> Impact - Functionality is break on the above code flow , and  it will not 
> able to auto-recover and overwriting the corrupted  index file. 
> Check the screenshot attached for more reference. 
> cc [~divijvaidya] [~satish.duggana] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15511) Exception not handled correctly if indexFile is corrupted.

2023-09-27 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769482#comment-17769482
 ] 

Arpit Goyal commented on KAFKA-15511:
-

[~divijvaidya] Yes working on it right away. 

>  Exception not handled correctly if indexFile is corrupted.
> ---
>
> Key: KAFKA-15511
> URL: https://issues.apache.org/jira/browse/KAFKA-15511
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Major
> Fix For: 3.7.0
>
> Attachments: Screenshot 2023-09-27 at 1.01.58 PM.png, Screenshot 
> 2023-09-27 at 1.14.22 PM.png
>
>
> I was simulating  a code flow where  there is a possibility of inconsistency 
> between RemoteCache and fileStored on disk.
> 1. OffsetIndex Corrupt File already exist on disk 
> 2. There is no entry exist in the RemoteIndexCache 
> 3.  Call getIndexEntry
> 4. As File already exists on the disk and corrupted , indexSanityCheck  will 
> throws "*CorruptIndexException*"
> 5. But the code flow in RemoteIndexCache  catches only 
> *"CorruptRecordException".*
> Ideally it should catch *CorruptIndexException* instead of  
> *CorruptRecordException*.
> Impact - Functionality is break on the above code flow , and  it will not 
> able to auto-recover and overwriting the corrupted  index file. 
> Check the screenshot attached for more reference. 
> cc [~divijvaidya] [~satish.duggana] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15511) Exception not handled correctly if indexFile is corrupted.

2023-09-27 Thread Arpit Goyal (Jira)
Arpit Goyal created KAFKA-15511:
---

 Summary:  Exception not handled correctly if indexFile is 
corrupted.
 Key: KAFKA-15511
 URL: https://issues.apache.org/jira/browse/KAFKA-15511
 Project: Kafka
  Issue Type: Bug
Reporter: Arpit Goyal
Assignee: Arpit Goyal
 Attachments: Screenshot 2023-09-27 at 1.01.58 PM.png, Screenshot 
2023-09-27 at 1.14.22 PM.png

I was simulating  a code flow where  there is a possibility of inconsistency 
between RemoteCache and fileStored on disk.


1. OffsetIndex Corrupt File already exist on disk 
2. There is no entry exist in the RemoteIndexCache 
3.  Call getIndexEntry
4. As File already exists on the disk and corrupted , indexSanityCheck  will 
throws "*CorruptIndexException*"

5. But the code flow in RemoteIndexCache  catches only 
*"CorruptRecordException".*

Ideally it should catch *CorruptIndexException* instead of  
*CorruptRecordException*.

Impact - Functionality is break on the above code flow , and  it will not able 
to auto-recover and overwriting the corrupted  index file. 

Check the screenshot attached for more reference. 

cc [~divijvaidya] [~satish.duggana] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-27 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769459#comment-17769459
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya] [~satish.duggana] I discovered a bug while writing test cases 
for inconsistencies between disk storage and  RemoteIndexCache. Please have a 
look. 
https://issues.apache.org/jira/browse/KAFKA-15511

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15511) Exception not handled correctly if indexFile is corrupted.

2023-09-27 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal updated KAFKA-15511:

Attachment: Screenshot 2023-09-27 at 1.01.58 PM.png
Screenshot 2023-09-27 at 1.14.22 PM.png

>  Exception not handled correctly if indexFile is corrupted.
> ---
>
> Key: KAFKA-15511
> URL: https://issues.apache.org/jira/browse/KAFKA-15511
> Project: Kafka
>  Issue Type: Bug
>Reporter: Arpit Goyal
>Assignee: Arpit Goyal
>Priority: Major
> Attachments: Screenshot 2023-09-27 at 1.01.58 PM.png, Screenshot 
> 2023-09-27 at 1.14.22 PM.png
>
>
> I was simulating  a code flow where  there is a possibility of inconsistency 
> between RemoteCache and fileStored on disk.
> 1. OffsetIndex Corrupt File already exist on disk 
> 2. There is no entry exist in the RemoteIndexCache 
> 3.  Call getIndexEntry
> 4. As File already exists on the disk and corrupted , indexSanityCheck  will 
> throws "*CorruptIndexException*"
> 5. But the code flow in RemoteIndexCache  catches only 
> *"CorruptRecordException".*
> Ideally it should catch *CorruptIndexException* instead of  
> *CorruptRecordException*.
> Impact - Functionality is break on the above code flow , and  it will not 
> able to auto-recover and overwriting the corrupted  index file. 
> Check the screenshot attached for more reference. 
> cc [~divijvaidya] [~satish.duggana] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769425#comment-17769425
 ] 

Arpit Goyal edited comment on KAFKA-15169 at 9/27/23 5:37 AM:
--

[~divijvaidya] As per the code RemoteIndexCache never retries if file gets 
corrupted after remote storage fetch. I will create a separate ticket to track 
this enhancement . For the 1st test case I am thinking of writing a  test case 
where it should throw corrupt exception if files get corrupted during remote 
fetch.
And adding Other test cases of what we discussed i will cover it is as part of 
this JIRA.  WDYT ?


was (Author: JIRAUSER301926):
[~divijvaidya] As per the code RemoteIndexCache never retries if file gets 
corrupted after remote storage fetch. I will create a separate ticket to track 
this enhancement  and which  indirectly cover1st test case we discussed.
Other test cases of what we discussed i will cover it is as part of this JIRA.  
WDYT ?

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769425#comment-17769425
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya] As per the code RemoteIndexCache never retries if file gets 
corrupted after remote storage fetch. I will create a separate ticket to track 
this enhancement  and which  indirectly cover1st test case we discussed.
Other test cases of what we discussed i will cover it is as part of this JIRA.  
WDYT ?

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769103#comment-17769103
 ] 

Arpit Goyal edited comment on KAFKA-15169 at 9/26/23 10:42 AM:
---

[~divijvaidya] Just to confirm what I understood the code flow  of the first 
test case 

1. we call getIndexEntry
2. It throws corrupt storage exception( This exception will be thrown after 
fetching from remote storage )
i.e.

{code:java}
 Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false);
   index = readIndex.apply(indexFile);  // throws remote Storage exception
{code}
3. We call getIndexEntry again 
4. This time file already exist on disk , it will log the corrupted error 
 5. It  will refetch from remote storage and passes the sanity check. 
The test case is basically to test the flow when corrupted file already exist 
on disk ?


was (Author: JIRAUSER301926):
[~divijvaidya] Just to confirm what I understood the flow  of the first test 
case 

1. we call getIndexEntry
2. It throws corrupt storage exception( This exception will be thrown after 
fetching from remote storage )
i.e.

{code:java}
 Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false);
   index = readIndex.apply(indexFile);  // throws remote Storage exception
{code}
3. We call getIndexEntry again 
4. This time file already exist on disk , it will log the corrupted error 
 5. It  will refetch from remote storage and passes the sanity check. 
The test case is basically to test the flow when corrupted file already exist 
on disk ?

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769103#comment-17769103
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya] Just to confirm what I understood the flow  of the first test 
case 

1. we call getIndexEntry
2. It throws corrupt storage exception( This exception will be thrown after 
fetching from remote storage )
i.e.

{code:java}
 Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false);
   index = readIndex.apply(indexFile);  // throws remote Storage exception
{code}
3. We call getIndexEntry again 
4. This time file already exist on disk , it will log the corrupted error 
 5. It  will refetch from remote storage and passes the sanity check. 
The test case is basically to test the flow when corrupted file already exist 
on disk ?

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769086#comment-17769086
 ] 

Arpit Goyal commented on KAFKA-15169:
-

Thanks [~divijvaidya] 
I have  two questions based on the code walkthrough 

{code:java}
 private  T loadIndexFile(File file, RemoteLogSegmentMetadata 
remoteLogSegmentMetadata,
Function 
fetchRemoteIndex,
Function readIndex) throws IOException 
{
File indexFile = new File(cacheDir, file.getName());
T index = null;
if (Files.exists(indexFile.toPath())) {
try {
index = readIndex.apply(indexFile);
} catch (CorruptRecordException ex) {
log.info("Error occurred while loading the stored index file 
{}", indexFile.getPath(), ex);
}
}
if (index == null) {
File tmpIndexFile = new File(indexFile.getParentFile(), 
indexFile.getName() + RemoteIndexCache.TMP_FILE_SUFFIX);
try (InputStream inputStream = 
fetchRemoteIndex.apply(remoteLogSegmentMetadata)) {
Files.copy(inputStream, tmpIndexFile.toPath(), 
StandardCopyOption.REPLACE_EXISTING);
}
Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), 
indexFile.toPath(), false);
index = readIndex.apply(indexFile);
}
return index;
}
{code}


 In the RemoteIndexCache (loadIndexFile) function 
1. First we check if file exists on the disk and do a sanityCheck. I believe 
this part of code will never be executed as it occurs only when there is a 
cache miss operation. 
2. As per the first test case  it would through Corrupt record exception at the 
later  part of the code where we fetch it from remote segment and doing a 
sanityCheck 

{code:java}
 Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false);
index = readIndex.apply(indexFile);
{code}
I was believing the first test case was related to file already exist on the 
disk and then call getIndexEntry
1. Create a empty/corrupt file on disk
2. Call getIndexEntry 
3. It throws record corrupted action
4. In the next line it fetches from remote storage and restore the file.

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-25 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769006#comment-17769006
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya] [~satish.duggana] In the test case 1  which you mentioned to 
test for the corrupted scenario , I discovered we are not handling  one more 
use case which can be problematic ? 
Example 
1. we call getIndexEntry
2. It succeeded 
3. Somehow the offsetIndex file is corrupted  
4. we call getIndexEntry 
5. As it is already in the cache , it would always return the corrupted file 
without validating the sanityCheck ?



> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-24 Thread Arpit Goyal (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-15169 ]


Arpit Goyal deleted comment on KAFKA-15169:
-

was (Author: JIRAUSER301926):
[~divijvaidya]  In the first test case Step no 3 
Do we support this functionality fetchAndCreateIndex  ? 
As per the codebase there is only  getIndexEntry function which calls remote 
storage only if there is a miss ?


> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-24 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768372#comment-17768372
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya]  In the first test case Step no 3 
Do we support this functionality fetchAndCreateIndex  ? 
As per the codebase there is only  getIndexEntry function which calls remote 
storage only if there is a miss ?


> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-24 Thread Arpit Goyal (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-15169 ]


Arpit Goyal deleted comment on KAFKA-15169:
-

was (Author: JIRAUSER301926):
[~divijvaidya]  I have started working on it . Can you help me understand what 
does "fetchAndCreateIndex' means . There is no such function exists in the 
remotestoragemanager class. Does it mean while fetching from the cache  it 
should creates an index entry?

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-24 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768352#comment-17768352
 ] 

Arpit Goyal edited comment on KAFKA-15169 at 9/24/23 10:55 AM:
---

[~divijvaidya]  I have started working on it . Can you help me understand what 
does "fetchAndCreateIndex' means . There is no such function exists in the 
remotestoragemanager class. Does it mean while fetching from the cache  it 
should creates an index entry?


was (Author: JIRAUSER301926):
[~divijvaidya]  I have started working on it . Can you help me understand what 
does "fetchAndCreateIndex' means . There is no such function exists in the 
remotestoragemanager class. 

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-24 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768352#comment-17768352
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya]  I have started working on it . Can you help me understand what 
does "fetchAndCreateIndex' means . There is no such function exists in the 
remotestoragemanager class. 

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-21 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-15169:
---

Assignee: Arpit Goyal  (was: Lan Ding)

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Arpit Goyal
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-21 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767639#comment-17767639
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~divijvaidya] [~isding_l]  [~satish.duggana] I am picking it up as I see no  
activity on this from July. 

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Lan Ding
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache

2023-09-18 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766318#comment-17766318
 ] 

Arpit Goyal commented on KAFKA-15169:
-

[~isding_l] Can i pick this up ?

> Add tests for RemoteIndexCache
> --
>
> Key: KAFKA-15169
> URL: https://issues.apache.org/jira/browse/KAFKA-15169
> Project: Kafka
>  Issue Type: Test
>Reporter: Satish Duggana
>Assignee: Lan Ding
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14954) Use BufferPools to optimize allocation in RemoteLogInputStream

2023-08-28 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759558#comment-17759558
 ] 

Arpit Goyal commented on KAFKA-14954:
-

[~abhijeetkumar]  Can i pick this up or you are already working on this. 

> Use BufferPools to optimize allocation in RemoteLogInputStream
> --
>
> Key: KAFKA-14954
> URL: https://issues.apache.org/jira/browse/KAFKA-14954
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Abhijeet Kumar
>Priority: Minor
>
> ref: https://github.com/apache/kafka/pull/13535#discussion_r1180144730



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15378) Rolling upgrade system tests are failing

2023-08-28 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759522#comment-17759522
 ] 

Arpit Goyal commented on KAFKA-15378:
-

[~lbrutschy]  Not yet , I just tried to reproduce the issue but it is stuck at 
this error , Do you know the reason for this error ?
Could not detect Kafka Streams version 3.6.0-SNAPSHOT on ducker@ducker12

> Rolling upgrade system tests are failing
> 
>
> Key: KAFKA-15378
> URL: https://issues.apache.org/jira/browse/KAFKA-15378
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Affects Versions: 3.5.1
>Reporter: Lucas Brutschy
>Priority: Major
>
> The system tests are having failures for these tests:
> {noformat}
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.1.2.to_version=3.6.0-SNAPSHOT
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.2.3.to_version=3.6.0-SNAPSHOT
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.3.1.to_version=3.6.0-SNAPSHOT
> {noformat}
> See 
> [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5801/console]
>  for logs and other test data.
> Note that system tests currently only run with [this 
> fix](https://github.com/apache/kafka/commit/24d1780061a645bb2fbeefd8b8f50123c28ca94e),
>  I think some CVE python library update broke the system tests... 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15378) Rolling upgrade system tests are failing

2023-08-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759270#comment-17759270
 ] 

Arpit Goyal edited comment on KAFKA-15378 at 8/26/23 5:33 PM:
--

[~lbrutschy]  As per description do you mean   the fix will work with this 
change ?

install_requires=["ducktape<0.9", "requests==2.31.0"],

I am trying to reproduce the issue locally , but I am getting this error 
{code:java}
[INFO:2023-08-26 10:29:56,640]: RunnerClient: 
kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=0.10.1.1.to_version=3.6.0-SNAPSHOT:
 FAIL: TimeoutError('Could not detect Kafka Streams version 3.6.0-SNAPSHOT on 
ducker@ducker12')
 {code}
But it does not seem to matching the logs attached in the ticket ?

 


was (Author: JIRAUSER301926):
[~lbrutschy]  As per description do you mean   the fix will work with this 
change ?

install_requires=["ducktape<0.9", "requests==2.31.0"],

> Rolling upgrade system tests are failing
> 
>
> Key: KAFKA-15378
> URL: https://issues.apache.org/jira/browse/KAFKA-15378
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Affects Versions: 3.5.1
>Reporter: Lucas Brutschy
>Priority: Major
>
> The system tests are having failures for these tests:
> {noformat}
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.1.2.to_version=3.6.0-SNAPSHOT
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.2.3.to_version=3.6.0-SNAPSHOT
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.3.1.to_version=3.6.0-SNAPSHOT
> {noformat}
> See 
> [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5801/console]
>  for logs and other test data.
> Note that system tests currently only run with [this 
> fix](https://github.com/apache/kafka/commit/24d1780061a645bb2fbeefd8b8f50123c28ca94e),
>  I think some CVE python library update broke the system tests... 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15378) Rolling upgrade system tests are failing

2023-08-26 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759270#comment-17759270
 ] 

Arpit Goyal commented on KAFKA-15378:
-

[~lbrutschy]  As per description do you mean   the fix will work with this 
change ?

install_requires=["ducktape<0.9", "requests==2.31.0"],

> Rolling upgrade system tests are failing
> 
>
> Key: KAFKA-15378
> URL: https://issues.apache.org/jira/browse/KAFKA-15378
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Affects Versions: 3.5.1
>Reporter: Lucas Brutschy
>Priority: Major
>
> The system tests are having failures for these tests:
> {noformat}
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.1.2.to_version=3.6.0-SNAPSHOT
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.2.3.to_version=3.6.0-SNAPSHOT
> kafkatest.tests.streams.streams_upgrade_test.StreamsUpgradeTest.test_rolling_upgrade_with_2_bounces.from_version=3.3.1.to_version=3.6.0-SNAPSHOT
> {noformat}
> See 
> [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5801/console]
>  for logs and other test data.
> Note that system tests currently only run with [this 
> fix](https://github.com/apache/kafka/commit/24d1780061a645bb2fbeefd8b8f50123c28ca94e),
>  I think some CVE python library update broke the system tests... 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15401) Segment with corrupted index should not be uploaded to remote storage

2023-08-25 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759205#comment-17759205
 ] 

Arpit Goyal commented on KAFKA-15401:
-

[~nickstery]  Is it resolved ? or i can pick it 

> Segment with corrupted index should not be uploaded to remote storage
> -
>
> Key: KAFKA-15401
> URL: https://issues.apache.org/jira/browse/KAFKA-15401
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.6.0
>Reporter: Viktor Nikitash
>Priority: Minor
>  Labels: KIP-405
> Fix For: 3.6.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> During data disk full event, there could be a situation with index 
> corruption. There are existing functions which perform sanity checks to 
> TimeIndex [1], TxnIndex [2], OffsetIndex [2]. The idea is performing same 
> checks before in RemoteLogManager before we upload segment to remote storage 
> [4].
> Resources:
> [1][TimeIndex::sanityCheck()|https://github.com/apache/kafka/blob/88d2c4460a1c8c8cf5dbcc9edb43f42fe898ca00/storage/src/main/java/org/apache/kafka/storage/internals/log/TimeIndex.java#L73]
> [2][TransationIndex::sanityCheck()|https://github.com/apache/kafka/blob/88d2c4460a1c8c8cf5dbcc9edb43f42fe898ca00/storage/src/main/java/org/apache/kafka/storage/internals/log/TransactionIndex.java#L187]
> [3][OffsetIndex::sanityCheck()|#L78]]
> [4][RemoteLogManager::copyLogSegmentsToRemote()|https://github.com/apache/kafka/blob/88d2c4460a1c8c8cf5dbcc9edb43f42fe898ca00/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L649]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15405) Create a new error code to indicate a resource is not ready yet

2023-08-25 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759067#comment-17759067
 ] 

Arpit Goyal commented on KAFKA-15405:
-

[~abhijeetkumar]  Can you provide a context or document to proceed further on 
this.

> Create a new error code to indicate a resource is not ready yet
> ---
>
> Key: KAFKA-15405
> URL: https://issues.apache.org/jira/browse/KAFKA-15405
> Project: Kafka
>  Issue Type: Task
>Reporter: Abhijeet Kumar
>Assignee: Arpit Goyal
>Priority: Minor
>
> We need a new error code to indicate to the client that the resource is not 
> ready on the server yet and is initializing. When the client receives this 
> error it should retry again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15405) Create a new error code to indicate a resource is not ready yet

2023-08-25 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-15405:
---

Assignee: Arpit Goyal

> Create a new error code to indicate a resource is not ready yet
> ---
>
> Key: KAFKA-15405
> URL: https://issues.apache.org/jira/browse/KAFKA-15405
> Project: Kafka
>  Issue Type: Task
>Reporter: Abhijeet Kumar
>Assignee: Arpit Goyal
>Priority: Minor
>
> We need a new error code to indicate to the client that the resource is not 
> ready on the server yet and is initializing. When the client receives this 
> error it should retry again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15256) Add code reviewers to contributors list in release email

2023-08-24 Thread Arpit Goyal (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758714#comment-17758714
 ] 

Arpit Goyal commented on KAFKA-15256:
-

Hi [~divijvaidya] 

Can you please review https://github.com/apache/kafka/pull/14288

 

> Add code reviewers to contributors list in release email
> 
>
> Key: KAFKA-15256
> URL: https://issues.apache.org/jira/browse/KAFKA-15256
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Minor
>
> Today, we parse the names from commit messages and the authors and co-authors 
> are added as contributors in the release email. We should add reviewers as 
> well. This can be done by parsing the "reviewed by:" field in the commit 
> message.
> Context, see conversation at: https://github.com/apache/kafka/pull/14080



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15256) Add code reviewers to contributors list in release email

2023-08-24 Thread Arpit Goyal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Goyal reassigned KAFKA-15256:
---

Assignee: Arpit Goyal

> Add code reviewers to contributors list in release email
> 
>
> Key: KAFKA-15256
> URL: https://issues.apache.org/jira/browse/KAFKA-15256
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Divij Vaidya
>Assignee: Arpit Goyal
>Priority: Minor
>
> Today, we parse the names from commit messages and the authors and co-authors 
> are added as contributors in the release email. We should add reviewers as 
> well. This can be done by parsing the "reviewed by:" field in the commit 
> message.
> Context, see conversation at: https://github.com/apache/kafka/pull/14080



--
This message was sent by Atlassian Jira
(v8.20.10#820010)