showuon opened a new pull request #10301:
URL: https://github.com/apache/kafka/pull/10301


   Found the root cause about why the the test 
`shouldNotViolateEosIfOneTaskFailsWithState` failed sometimes with unexpected 
committed/uncommitted result. The reason is unexpected rebalance during 
committing messages, and it causes the fail over mechanism. And the reason why 
the rebalance is triggered is because we reduce the `max.poll.interval.ms` 
value for the 
`shouldNotViolateEosIfOneTaskGetsFencedUsingIsolatedAppInstances`, which is 
trying to stall a thread, and wait for exceeding the `max.poll.interval.ms`, 
and trigger the rebalance. As we know, under `withState` situation, we have 
more things to handle with the state and additional topics..., so it explains 
why only the `shouldNotViolateEosIfOneTaskFailsWithState` is flaky, not other 
tests. 
   
   I increased the `max.poll.interval.ms` for the `withState` test to fix the 
flaky test. Also, did some enhancement:
   1. add failed reason. Currently, the failed message is like: `Expected: 
<[...]>, but: was: <[...]>`, and it didn't tell us the result is committed or 
uncommitted result, before injected error or after. We have to map the stack 
trace to know it. Improve it
   2. The fail() in `uncaughtException` will only fail the stream thread, not 
the test. fix it/
   3. add the capacity for the ArrayList to avoid memory reallocation.
   4. Improve the comments, and add the state view for each phase
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to