[ 
https://issues.apache.org/jira/browse/KAFKA-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-15481.
----------------------------------
    Resolution: Fixed

> Concurrency bug in RemoteIndexCache leads to IOException
> --------------------------------------------------------
>
>                 Key: KAFKA-15481
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15481
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.6.0
>            Reporter: Divij Vaidya
>            Assignee: Jeel Jotaniya
>            Priority: Major
>             Fix For: 3.7.0, 3.6.1
>
>
> RemoteIndexCache has a concurrency bug which leads to IOException while 
> fetching data from remote tier.
> Below events in order of timeline -
> Thread 1 (cache thread): invalidates the entry, removalListener is invoked 
> async, so the files have not been renamed to "deleted" suffix yet.
> Thread 2: (fetch thread): tries to find entry in cache, doesn't find it 
> because it has been removed by 1, fetches the entry from S3, writes it to 
> existing file (using replace existing)
> Thread 1: async removalListener is invoked, acquires a lock on old entry 
> (which has been removed from cache), it renames the file to "deleted" and 
> starts deleting it
> Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file 
> and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM 
> returns an error as it won't allow creation of 2GB random access file.
> *Potential Fix*
> Use EvictionListener instead of RemovalListener in Caffeine cache as per the 
> documentation:
> {quote} When the operation must be performed synchronously with eviction, use 
> {{Caffeine.evictionListener(RemovalListener)}} instead. This listener will 
> only be notified when {{RemovalCause.wasEvicted()}} is true. For an explicit 
> removal, {{Cache.asMap()}} offers compute methods that are performed 
> atomically.{quote}
> This will ensure that removal from cache and marking the file with delete 
> suffix is synchronously done, hence the above race condition will not occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to