ableegoldman opened a new pull request #11868:
URL: https://github.com/apache/kafka/pull/11868


   This test has started to become flaky at a relatively low, but consistently 
reproducible, rate. Upon inspection, we find this is due to IOExceptions during 
the #cleanUpNamedTopology call -- specifically, most often a 
`DirectoryNotEmptyException` with an ocasional` FileNotFoundException`
   
   Basically, signs pointed to having returned from/completed the 
`#removeNamedTopology` future prematurely, and moving on to try and clear out 
the topology's state directory while there was a streamthread somewhere that 
was continuing to process/close its tasks.
   
   I believe this is due to updating the thread's topology version _before_ we 
perform the actual topology update, in this case specifically the act of eg 
clearing out a directory. If one thread updates its version and then goes to 
perform the topology removal/cleanup when the second thread finishes its own 
topology removal, this other thread will check whether all threads are on the 
latest version and complete any waiting futures if so -- which means it can 
complete the future before the first thread has actually completed the 
corresponding action


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to