prashantpogde commented on a change in pull request #1257:
URL: https://github.com/apache/hadoop-ozone/pull/1257#discussion_r461326748



##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -525,7 +523,7 @@ public void testScmInfo() throws Exception {
   /**
    * Test datanode heartbeat well processed with a 4-layer network topology.
    */
-  @Test(timeout = 60000)
+  @Test(timeout = 180000)
   public void testScmProcessDatanodeHeartbeat() throws Exception {

Review comment:
       I tested it multiple times on my laptop and tests use to timeout 
occasionally depending on other applications on my laptop. I kept it long 
enough that there should be no flakiness in the tests.

##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -604,8 +602,13 @@ public void testCloseContainerCommandOnRestart() throws 
Exception {
       // Stop processing HB
       scm.getDatanodeProtocolServer().stop();
 
-      scm.getContainerManager().updateContainerState(selectedContainer
-          .containerID(), HddsProtos.LifeCycleEvent.FINALIZE);
+      LoggerFactory.getLogger(TestStorageContainerManager.class).info(

Review comment:
       done

##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -604,8 +602,13 @@ public void testCloseContainerCommandOnRestart() throws 
Exception {
       // Stop processing HB
       scm.getDatanodeProtocolServer().stop();
 
-      scm.getContainerManager().updateContainerState(selectedContainer
-          .containerID(), HddsProtos.LifeCycleEvent.FINALIZE);
+      LoggerFactory.getLogger(TestStorageContainerManager.class).info(
+          "Current Container State is" + selectedContainer.getState());
+      if (selectedContainer.getState() == HddsProtos.LifeCycleState.OPEN) {
+        scm.getContainerManager().updateContainerState(selectedContainer
+            .containerID(), HddsProtos.LifeCycleEvent.FINALIZE);

Review comment:
       no, I couldn't find the other thread closing the container. one theory 
could be, We stopped processing the heartbeat from the datanode and if it 
assumes data node is dead, it could close the container. 
   Yes there is a race condition still unless it could be done atomically. I 
changed it to try/catch to avoid any race condition.

##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -593,7 +591,7 @@ public void testCloseContainerCommandOnRestart() throws 
Exception {
           new TestStorageContainerManagerHelper(cluster, conf);
 
       helper.createKeys(10, 4096);
-      Thread.sleep(5000);
+      Thread.sleep(10000);

Review comment:
       done

##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -604,8 +602,13 @@ public void testCloseContainerCommandOnRestart() throws 
Exception {
       // Stop processing HB
       scm.getDatanodeProtocolServer().stop();
 
-      scm.getContainerManager().updateContainerState(selectedContainer
-          .containerID(), HddsProtos.LifeCycleEvent.FINALIZE);
+      LoggerFactory.getLogger(TestStorageContainerManager.class).info(
+          "Current Container State is" + selectedContainer.getState());

Review comment:
       done

##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -593,7 +591,7 @@ public void testCloseContainerCommandOnRestart() throws 
Exception {
           new TestStorageContainerManagerHelper(cluster, conf);
 
       helper.createKeys(10, 4096);
-      Thread.sleep(5000);
+      Thread.sleep(10000);

Review comment:
       I spent some time looking around in the code. Not sure how we can do 
this cleanly for the other sleep.

##########
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
##########
@@ -593,7 +591,7 @@ public void testCloseContainerCommandOnRestart() throws 
Exception {
           new TestStorageContainerManagerHelper(cluster, conf);
 
       helper.createKeys(10, 4096);
-      Thread.sleep(5000);
+      Thread.sleep(10000);

Review comment:
       Changed this to ->
    - waiting till the replication manager comes up using 
GenericTestUtils.waitFor and then
    - wait some more to give it enough time to process containers




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to