adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in 
TestBlockManager
URL: https://github.com/apache/hadoop-ozone/pull/659
 
 
   ## What changes were proposed in this pull request?
   
   `TestBlockManager` intermittently times out waiting for exit from safe mode. 
 This happens due to race condition between two safe mode status events in 
different handler threads (but the same handler object): one from SCM, another 
from the test code.
   
   Temporary debug log (in "passing" order):
   
   ```
   (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling 
safe mode status event in thread 26: true
   (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling 
safe mode status event in thread 28: false
   ```
   
   If the order is reversed, SCM may stay in safe mode as far as 
`BlockManagerImpl` sees it.  Worse, it may return to safe mode while 
`BlockManagerImpl` is trying to perform some operation, eg.:
   
   ```
   SCMException: SafeModePrecheck failed for allocateBlock
   ...
     at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:160)
     at 
org.apache.hadoop.hdds.scm.block.TestBlockManager.testAllocateBlock(TestBlockManager.java:150)
   ```
   
   The proposed fix is to disable safe mode status emission (ie. ignore the 
event from SCM) and let the test set safe mode explicitly in 
`BlockManagerImpl`.  This should be fine since this is a unit test, not 
integration one.
   
   https://issues.apache.org/jira/browse/HDDS-2989
   
   ## How was this patch tested?
   
   Ran TestBlockManager 10x:
   https://github.com/adoroszlai/hadoop-ozone/runs/497791137
   
   then 50x:
   https://github.com/adoroszlai/hadoop-ozone/runs/497839450
   
   and regular full CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/498781616

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to