[ 
https://issues.apache.org/jira/browse/KAFKA-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767645#comment-17767645
 ] 

Ben Manes commented on KAFKA-15481:
-----------------------------------

Excellent investigation! You are correct, using a computation for explicit 
removals and and an {{EvictionListener}} for automatic ones is the right 
approach. These will perform the work under {{ConcurrentHashMap}} entry lock, 
so it will be atomic with the map operation.

The {{removalListener}} option is asynchronous (non-atomic to removal) in Guava 
and therefore Caffeine. When adding a synchronous variant, we couldn't cover 
explicit removals because of `AsyncCache`, as a user could clear all of the 
in-flight mappings and would be very surprised if that blocked waiting for each 
to complete. An async cache's entry is eligible for eviction only when 
completed, so this fit very nicely as an eviction listener and using `asMap` 
computations for explicit calls.

> Concurrency bug in RemoteIndexCache leads to IOException
> --------------------------------------------------------
>
>                 Key: KAFKA-15481
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15481
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.6.0
>            Reporter: Divij Vaidya
>            Priority: Major
>             Fix For: 3.7.0
>
>
> RemoteIndexCache has a concurrency bug which leads to IOException while 
> fetching data from remote tier.
> Below events in order of timeline -
> Thread 1 (cache thread): invalidates the entry, removalListener is invoked 
> async, so the files have not been renamed to "deleted" suffix yet.
> Thread 2: (fetch thread): tries to find entry in cache, doesn't find it 
> because it has been removed by 1, fetches the entry from S3, writes it to 
> existing file (using replace existing)
> Thread 1: async removalListener is invoked, acquires a lock on old entry 
> (which has been removed from cache), it renames the file to "deleted" and 
> starts deleting it
> Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file 
> and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM 
> returns an error as it won't allow creation of 2GB random access file.
> *Potential Fix*
> Use EvictionListener instead of RemovalListener in Caffeine cache as per the 
> documentation:
> {quote} When the operation must be performed synchronously with eviction, use 
> {{Caffeine.evictionListener(RemovalListener)}} instead. This listener will 
> only be notified when {{RemovalCause.wasEvicted()}} is true. For an explicit 
> removal, {{Cache.asMap()}} offers compute methods that are performed 
> atomically.{quote}
> This will ensure that removal from cache and marking the file with delete 
> suffix is synchronously done, hence the above race condition will not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to