Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-25 Thread via GitHub


adoroszlai commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1856969129


##
hadoop-ozone/dist/src/main/compose/compatibility/docker-config:
##
@@ -21,7 +21,7 @@ 
OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB
 OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
 OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
 OZONE-SITE.XML_ozone.scm.names=scm
-OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data
+OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata

Review Comment:
   @swamirishi Turns out similar change is needed for kubernetes clusters, but 
the check was silently failing.  Please see #7483.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850789106


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   I have created a follow up jira for performing this validation:
   https://issues.apache.org/jira/browse/HDDS-11761



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850781332


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   in a follow up jira please accept write chunks only for EC pipelines if 
container is recovering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi merged PR #7402:
URL: https://github.com/apache/ozone/pull/7402


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2489270850

   Thanks for the review on the patch @kerneltime @sodonnel @adoroszlai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850756295


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   I am testing both creating an open container and recovering container. 
Agnostic of whether it is an EC key or a ratis key we land in the same 
dispatcher flow. So we don't have to test this separately 
https://github.com/apache/ozone/blob/827bc86915efbb398f024ccc467de5db7ecd347d/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java#L253-L256



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850756295


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   I am testing both creating an open container and recovering container. 
Agnostic of whether it is an EC key or a ratis key we land in the same 
dispatcher flow. So we don't have to test this separately 
https://github.com/apache/ozone/blob/827bc86915efbb398f024ccc467de5db7ecd347d/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java#L253-L256
   
   
https://github.com/apache/ozone/blob/827bc86915efbb398f024ccc467de5db7ecd347d/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java#L292-L296



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850756763


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  /**
+   * Add Container to container map. This would fail if the container is 
already present or has been marked as missing.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  /**
+   * Add Container to container map. This would overwrite the container even 
if it is missing. But would fail if the
+   * container is already present.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainerByOverwriteMissingContainer(Container 
container) throws StorageContainerException {
+return addContainer(container, true);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {
+if (missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d with state : %s is missing in" +
+  " the DN.", containerId, state),
+  ContainerProtos.Result.CONTAINER_MISSING);
+}
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  private boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850697263


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   Recovering container accepting writes should be for EC only? Do we want 
write chunk coming for a non EC container to fail? In this test this is a 
single node cluster 
   ```
 cluster = MiniOzoneCluster.newBuilder(conf)
 .setNumDatanodes(1)
 .build();
 cluster.waitForClusterToBeReady();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850407147


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2488725476

   > The change seems reasonable now. Please ensure we are testing the APIs we 
care about for a container after the container goes missing. I think we can 
generate more test cases via fault injection. We can discuss if it needs to be 
part of this PR or a separate PR (to keep this PR small).
   
   Should we keep the scope of this PR limited? We can look into adding a 
follow up jira for performing fault injection tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1850409369


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   I have added a loop for testing creating open container. Then post 
restarting I am creating a recovering container and performing the same set 
operations expecting it to succeed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-20 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849704065


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java:
##
@@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws 
Exception {
 }
   }
 
+  @Test
+  public void testOzoneContainerWithMissingContainer() throws Exception {
+MiniOzoneCluster cluster = null;
+try {
+  long containerID =
+  ContainerTestHelper.getTestContainerID();
+  OzoneConfiguration conf = newOzoneConfiguration();
+
+  // Start ozone container Via Datanode create.
+  cluster = MiniOzoneCluster.newBuilder(conf)
+  .setNumDatanodes(1)
+  .build();
+  cluster.waitForClusterToBeReady();
+
+  runTestOzoneContainerWithMissingContainer(cluster, containerID);
+} finally {
+  if (cluster != null) {
+cluster.shutdown();
+  }
+}
+  }
+
+  private void runTestOzoneContainerWithMissingContainer(
+  MiniOzoneCluster cluster, long testContainerID) throws Exception {
+ContainerProtos.ContainerCommandRequestProto
+request, writeChunkRequest, putBlockRequest,
+updateRequest1, updateRequest2;
+ContainerProtos.ContainerCommandResponseProto response,
+updateResponse1, updateResponse2;
+XceiverClientGrpc client = null;
+try {
+  // This client talks to ozone container via datanode.
+  client = createClientForTesting(cluster);
+  client.connect();
+  Pipeline pipeline = client.getPipeline();
+  createContainerForTesting(client, testContainerID);
+  writeChunkRequest = writeChunkForContainer(client, testContainerID,
+  1024);
+
+  DatanodeDetails datanodeDetails = 
cluster.getHddsDatanodes().get(0).getDatanodeDetails();
+  File containerPath =
+  new 
File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet().getContainer(testContainerID)
+  .getContainerData().getContainerPath());
+  cluster.getHddsDatanode(datanodeDetails).stop();
+  FileUtils.deleteDirectory(containerPath);
+
+  // Restart & Check if the container has been marked as missing, since 
the container directory has been deleted.
+  cluster.restartHddsDatanode(datanodeDetails, false);
+  GenericTestUtils.waitFor(() -> {
+try {
+  return 
cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine()
+  .getContainer().getContainerSet()
+  .getMissingContainerSet().contains(testContainerID);
+} catch (IOException e) {
+  return false;
+}
+  }, 1000, 3);
+
+  // Read Chunk

Review Comment:
   Should we add a second write chunk as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


kerneltime commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2487744095

   The change seems reasonable now. Please ensure we are testing the APIs we 
care about for a container after the container goes missing. I think we can 
generate more test cases via fault injection. We can discuss if it needs to be 
part of this PR or a separate PR (to keep this PR small). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849613595


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws
* @return Container
*/
   public Container getContainer(long containerId) {
-Preconditions.checkState(containerId >= 0,
-"Container Id cannot be negative.");
+Preconditions.checkState(containerId >= 0, "Container Id cannot be 
negative.");
 return containerMap.get(containerId);
   }
 
+  /**
+   * Removes container from both memory and database. This should be used when 
the containerData on disk has been
+   * removed completely from the node.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, true);
+  }
+
+  /**
+   * Removes containerId from memory. This needs to be used when the container 
is still present on disk, and the
+   * inmemory state of the container needs to be updated.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainerOnlyFromMemory(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, false);
+  }
+
+  /**
+   * Marks a container to be missing, thus it removes the container from 
inmemory containerMap and marks the
+   * container as missing.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeMissingContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, true, false);
+  }
+
   /**
* Removes the Container matching with specified containerId.
* @param containerId ID of the container to remove
* @return If container is removed from containerMap returns true, otherwise
* false
*/
-  public boolean removeContainer(long containerId) {
+  private boolean removeContainer(long containerId, boolean markMissing, 
boolean removeFromDB)

Review Comment:
   We don't have any use of that container information. Why keep it if it is 
not required?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849562938


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws
* @return Container
*/
   public Container getContainer(long containerId) {
-Preconditions.checkState(containerId >= 0,
-"Container Id cannot be negative.");
+Preconditions.checkState(containerId >= 0, "Container Id cannot be 
negative.");
 return containerMap.get(containerId);
   }
 
+  /**
+   * Removes container from both memory and database. This should be used when 
the containerData on disk has been
+   * removed completely from the node.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, true);
+  }
+
+  /**
+   * Removes containerId from memory. This needs to be used when the container 
is still present on disk, and the
+   * inmemory state of the container needs to be updated.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainerOnlyFromMemory(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, false);
+  }
+
+  /**
+   * Marks a container to be missing, thus it removes the container from 
inmemory containerMap and marks the
+   * container as missing.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeMissingContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, true, false);
+  }
+
   /**
* Removes the Container matching with specified containerId.
* @param containerId ID of the container to remove
* @return If container is removed from containerMap returns true, otherwise
* false
*/
-  public boolean removeContainer(long containerId) {
+  private boolean removeContainer(long containerId, boolean markMissing, 
boolean removeFromDB)

Review Comment:
   I was under the impression that once a container is created, it will remain 
the DB forever. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849423129


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws
* @return Container
*/
   public Container getContainer(long containerId) {
-Preconditions.checkState(containerId >= 0,
-"Container Id cannot be negative.");
+Preconditions.checkState(containerId >= 0, "Container Id cannot be 
negative.");
 return containerMap.get(containerId);
   }
 
+  /**
+   * Removes container from both memory and database. This should be used when 
the containerData on disk has been
+   * removed completely from the node.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, true);
+  }
+
+  /**
+   * Removes containerId from memory. This needs to be used when the container 
is still present on disk, and the
+   * inmemory state of the container needs to be updated.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainerOnlyFromMemory(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, false);
+  }
+
+  /**
+   * Marks a container to be missing, thus it removes the container from 
inmemory containerMap and marks the
+   * container as missing.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeMissingContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, true, false);
+  }
+
   /**
* Removes the Container matching with specified containerId.
* @param containerId ID of the container to remove
* @return If container is removed from containerMap returns true, otherwise
* false
*/
-  public boolean removeContainer(long containerId) {
+  private boolean removeContainer(long containerId, boolean markMissing, 
boolean removeFromDB)

Review Comment:
   **false, false** : This happens on finding a duplicate container on startup 
where the container info in memory gets swapped by removing and adding the 
updated container info.
   **true, false** : This happens on volume failure and we have to mark it as 
missing. This is a plain inmemory operation where in we remove the container 
from containerMap and add the containerId to the missingContainerSet.
   **false, true** : This is a regular delete flow which happens when SCM sends 
a delete container command. This will only happen after deleting the entire 
container data from the disk.
   **true, true** : This is an invalid case, this doesn't happen anywhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849374548


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  /**
+   * Add Container to container map. This would fail if the container is 
already present or has been marked as missing.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  /**
+   * Add Container to container map. This would overwrite the container even 
if it is missing. But would fail if the
+   * container is already present.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainerByOverwriteMissingContainer(Container 
container) throws StorageContainerException {
+return addContainer(container, true);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {

Review Comment:
   ```suggestion
 public void ensureContainerNotMissing(long containerId, State state) 
throws StorageContainerException {
   ```



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  /**
+   * Add Container to container map. This would fail if the container is 
already present or has been marked as missing.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  /**
+   * Add Container to container map. This would overwrite the container even 
if it is missing. But would fail if the
+   * container is already present.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainerByOverwriteMissingContainer(Container 
container) throws StorageContainerException {
+return addContainer(container, true);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {

Review Comment:
   Since this method throws an exception `ensure` should better, ideally this 
method just returns a boolean that is check and the called throws exception. 
This is a NIT. 



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  /**
+   * Add Container to container map. This would fail if the container is 
already present or has been marked as missing.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  /**
+   * Add Container to container map. This would overwrite the container even 
if it is missing. But would fail if the
+   * container is already present.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainerByOverwriteMissingContainer(Container 
container) throws StorageContainerException {
+return addContainer(container, true);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {
+if (missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d with state : %s is missing in" +
+  " the DN.", containerId, state),
+  ContainerProtos.Result.CONTAINER_MISSING);
+}
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  private boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws

Review Comment:
   ```suggestion
 private boolean addContainer(Container container, boolean overwrite) 
throws
   ```



##
hadoop-hdds/cont

Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849413986


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  /**
+   * Add Container to container map. This would fail if the container is 
already present or has been marked as missing.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  /**
+   * Add Container to container map. This would overwrite the container even 
if it is missing. But would fail if the
+   * container is already present.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainerByOverwriteMissingContainer(Container 
container) throws StorageContainerException {
+return addContainer(container, true);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849414451


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  /**
+   * Add Container to container map. This would fail if the container is 
already present or has been marked as missing.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  /**
+   * Add Container to container map. This would overwrite the container even 
if it is missing. But would fail if the
+   * container is already present.
+   * @param container container to be added
+   * @return If container is added to containerMap returns true, otherwise
+   * false
+   */
+  public boolean addContainerByOverwriteMissingContainer(Container 
container) throws StorageContainerException {
+return addContainer(container, true);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {
+if (missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d with state : %s is missing in" +
+  " the DN.", containerId, state),
+  ContainerProtos.Result.CONTAINER_MISSING);
+}
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  private boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers) {
+  validateContainerIsMissing(containerId, containerState);
+}

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849414760


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws
* @return Container
*/
   public Container getContainer(long containerId) {
-Preconditions.checkState(containerId >= 0,
-"Container Id cannot be negative.");
+Preconditions.checkState(containerId >= 0, "Container Id cannot be 
negative.");
 return containerMap.get(containerId);
   }
 
+  /**
+   * Removes container from both memory and database. This should be used when 
the containerData on disk has been
+   * removed completely from the node.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, true);
+  }
+
+  /**
+   * Removes containerId from memory. This needs to be used when the container 
is still present on disk, and the
+   * inmemory state of the container needs to be updated.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeContainerOnlyFromMemory(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, false, false);
+  }
+
+  /**
+   * Marks a container to be missing, thus it removes the container from 
inmemory containerMap and marks the
+   * container as missing.
+   * @param containerId
+   * @return True if container is removed from containerMap.
+   * @throws StorageContainerException
+   */
+  public boolean removeMissingContainer(long containerId) throws 
StorageContainerException {
+return removeContainer(containerId, true, false);
+  }
+
   /**
* Removes the Container matching with specified containerId.
* @param containerId ID of the container to remove
* @return If container is removed from containerMap returns true, otherwise
* false
*/
-  public boolean removeContainer(long containerId) {
+  private boolean removeContainer(long containerId, boolean markMissing, 
boolean removeFromDB)

Review Comment:
   `removeFromDB` do we expect it to be true outside of testing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1849376591


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##
@@ -354,6 +354,15 @@ ContainerCommandResponseProto handleCreateContainer(
 }
 
 long containerID = request.getContainerID();
+State containerState = request.getCreateContainer().getState();
+
+if (containerState != RECOVERING) {
+  try {
+containerSet.validateContainerIsMissing(containerID, containerState);
+  } catch (StorageContainerException ex) {
+return ContainerUtils.logAndReturnError(LOG, ex, request);
+  }
+}

Review Comment:
   Had a discussion offline with @kerneltime . We have to perform this state 
check because EC container replication path follows the dispatcher flow. In 
order to allow replication manager to re create a missing container on the same 
node, we have to perform this check and override the check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2487026394

   
[MissingContainerSet](https://github.com/apache/ozone/blob/4e603aa93c8479349776c1597ef54504453cb512/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L62)
 in the containerSet was added as part of solving 
[HDDS-935](https://issues.apache.org/jira/browse/HDDS-935), which is trying to 
solve the same problem (**Avoid creating an already created container on a 
datanode in case of disk removal followed by datanode restart**) we are trying 
to tackle here. But unfortunately the solution only accounted for ratis 
container which was because it was only container type back then. The solution 
looks into adding all the containers that are present in the [ratis snapshot's 
containerBCSIdMap](https://github.com/apache/ozone/blob/2139367b52d68773f25d79abce0439323dde9671/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L313-L323)
 but were not found on disk 
 after scanning through all the data volumes in the node. 
   [HDDS-8448](https://issues.apache.org/jira/browse/HDDS-8448) caused a 
regression resulting removing the containers on a failed volume instead of 
marking it as missing.
   This patch is trying to do something along the same lines by extending the 
working for EC containers and is trying to fix the issues in the implementation 
for handling volume failures in the system. I believe this particular 
implementation was working fine for ratis container before HDDS-8448 caused a 
regression. Creating a duplicate data structure performing the same 
functionalities is just redundant and could lead to future bugs because we 
would have to update 2 structures in place of two. I also don't see any value 
in implementing something all over again which would also involve thorough 
testing of that flow. IMO we should be reusing the existing frameworks in the 
system to achieve something rather than add more flows complicating the already 
existing system further. 
   If you still strongly feel we should move this logic out of the ContainerSet 
class then my opinion would be to get rid of the missingContainerSet entirely 
from all of our code flows and have only one of the 2 but not both. @sodonnel 
@kerneltime @adoroszlai what do you think we should do here?
   I have addressed the other review comments on the patch (Getting rid of the 
Proto2EnumCodec, Removing ReferenceCountedDB for the 
WitnessedContainerMetadataStore)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1848751487


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeDBDefinition.java:
##
@@ -0,0 +1,72 @@
+package org.apache.hadoop.ozone.container.metadata;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+import 
org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.utils.db.DBColumnFamilyDefinition;
+import org.apache.hadoop.hdds.utils.db.DBDefinition;
+import org.apache.hadoop.hdds.utils.db.LongCodec;
+import org.apache.hadoop.hdds.utils.db.Proto2EnumCodec;
+import org.apache.hadoop.ozone.OzoneConsts;
+
+import java.util.Map;
+
+/**
+ * Class for defining the schema for master volume in a datanode.
+ */
+public final class MasterVolumeDBDefinition extends DBDefinition.WithMap {

Review Comment:
   done



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeMetadataStore.java:
##
@@ -0,0 +1,88 @@
+package org.apache.hadoop.ozone.container.metadata;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos;
+import org.apache.hadoop.hdds.utils.db.DBStore;
+import org.apache.hadoop.hdds.utils.db.DBStoreBuilder;
+import org.apache.hadoop.hdds.utils.db.Table;
+import org.apache.hadoop.hdds.utils.db.managed.ManagedDBOptions;
+import org.apache.hadoop.ozone.container.common.utils.BaseReferenceCountedDB;
+import org.apache.hadoop.ozone.container.common.utils.ReferenceCountedHandle;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+/**
+ * Class for interacting with database in the master volume of a datanode.
+ */
+public final class MasterVolumeMetadataStore extends 
AbstractRDBStore

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1848750611


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##
@@ -378,7 +387,11 @@ ContainerCommandResponseProto handleCreateContainer(
 try {
   if (containerSet.getContainer(containerID) == null) {
 newContainer.create(volumeSet, volumeChoosingPolicy, clusterId);
-created = containerSet.addContainer(newContainer);
+if (RECOVERING == newContainer.getContainerState()) {
+  created = 
containerSet.addContainerByOverwriteMissingContainer(newContainer);
+} else {
+  created = containerSet.addContainer(newContainer);
+}

Review Comment:
   If I rename the method name it would lead to change everywhere. Increasing 
the number of files again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1848704439


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java:
##
@@ -276,7 +277,8 @@ private ContainerCommandResponseProto dispatchRequest(
 getMissingContainerSet().remove(containerID);
   }
 }
-if (getMissingContainerSet().contains(containerID)) {
+if (cmdType != Type.CreateContainer && !HddsUtils.isReadOnly(msg)

Review Comment:
   We should allow writes to go through since we could be creating a RECOVERING 
container done by replication manager. As of date that is a problem. Once a 
container is marked missing we don't allow replication manager replicate the 
same container on the datanode. Consider we have only 5 nodes in a cluster and 
we lost one replica of an EC container because of volume failure. Now 
reconstruction will fail forever because the create container command fired 
would fail forever now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1848704439


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java:
##
@@ -276,7 +277,8 @@ private ContainerCommandResponseProto dispatchRequest(
 getMissingContainerSet().remove(containerID);
   }
 }
-if (getMissingContainerSet().contains(containerID)) {
+if (cmdType != Type.CreateContainer && !HddsUtils.isReadOnly(msg)

Review Comment:
   We should allow writes to go through since we could be creating a RECOVERING 
container done by replication manager. As of date that is a problem. Once a 
container is marked missing we don't allow replication manager replicate the 
same container on the datanode. Consider we have only 5 nodes and we lost one 
replica because of volume failure now reconstruction will fail forever because 
the create container command fired would fail forever now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-19 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2486155997

   > Why is the logic embedded into 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
 A cleaner approach would be expand via aggregation the container set logic.
   > 
   > 1. Create a new Java class (WitnessedContainers.java) that reads and 
writes to the DB and keeps a concurrent copy in memory to handle concurrency 
(the current code does this)
   > 2. Do not bother with missing Container logic at the level of 
WitnessedContainers.java. When the replication manager tries to create a 
container, it checks if it is in the witnessed list and then adds it. So we 
only need two methods for addition `AddContainer` and 
`AddContainerIfNotPresent`.
   > 
   
   Then what is the purpose of missing container set? We shouldn't have 2 data 
structures solving the same stuff. We should remove the missing container set 
and only rely on the witnessed container list.
   
   > Do we need reference counted code to be introduced? Can we get away with 
just embedding it within the ContainerSet?
   > 
   
   I had to do the reference counted db implementation just because of the test 
cases. The PR would have gotten much bigger to fix the test case. A simpler 
approach was to use the reference counted, we can have only one instance of a 
writable db.
   > I think this code change can go through more than one round of 
simplification for what it is trying to solve.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


kerneltime commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847754135


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java:
##
@@ -276,7 +277,8 @@ private ContainerCommandResponseProto dispatchRequest(
 getMissingContainerSet().remove(containerID);
   }
 }
-if (getMissingContainerSet().contains(containerID)) {
+if (cmdType != Type.CreateContainer && !HddsUtils.isReadOnly(msg)

Review Comment:
   Why do we need to expand this check?



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##
@@ -378,7 +387,11 @@ ContainerCommandResponseProto handleCreateContainer(
 try {
   if (containerSet.getContainer(containerID) == null) {
 newContainer.create(volumeSet, volumeChoosingPolicy, clusterId);
-created = containerSet.addContainer(newContainer);
+if (RECOVERING == newContainer.getContainerState()) {
+  created = 
containerSet.addContainerByOverwriteMissingContainer(newContainer);
+} else {
+  created = containerSet.addContainer(newContainer);
+}

Review Comment:
   ```suggestion
   if (RECOVERING == newContainer.getContainerState()) {
 created = containerSet.addContainer(newContainer);
   } else {
 created = containerSet.addContainerIfNotPresent(newContainer);
   }
   ```



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##
@@ -354,6 +354,15 @@ ContainerCommandResponseProto handleCreateContainer(
 }
 
 long containerID = request.getContainerID();
+State containerState = request.getCreateContainer().getState();
+
+if (containerState != RECOVERING) {
+  try {
+containerSet.validateContainerIsMissing(containerID, containerState);
+  } catch (StorageContainerException ex) {
+return ContainerUtils.logAndReturnError(LOG, ex, request);
+  }
+}

Review Comment:
   This code reads poorly and complicates a straighforward logic. The witnessed 
container list should NOT care about the state. 
   ```suggestion
 if (containerSet.wasContainerWitnessed(containerID) && containerState 
!= RECOVERING) {
LOG.warn("Trying to create a container already witnessed on this 
node which is not in recovering state);
return ContainerUtils.logAndReturnError(LOG, null, request);
 }
 
   ```



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeDBDefinition.java:
##
@@ -0,0 +1,72 @@
+package org.apache.hadoop.ozone.container.metadata;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+import 
org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.utils.db.DBColumnFamilyDefinition;
+import org.apache.hadoop.hdds.utils.db.DBDefinition;
+import org.apache.hadoop.hdds.utils.db.LongCodec;
+import org.apache.hadoop.hdds.utils.db.Proto2EnumCodec;
+import org.apache.hadoop.ozone.OzoneConsts;
+
+import java.util.Map;
+
+/**
+ * Class for defining the schema for master volume in a datanode.
+ */
+public final class MasterVolumeDBDefinition extends DBDefinition.WithMap {

Review Comment:
   This new vocabulary introduced in this PR will complicate the code and 
design literature for a long time.
   We don't have a `MasterVolume` in the design. We have locations (paths) 
where we store Ratis logs, or Datanode ID. These could all be one volume or any 
volume. Looking at the code, I do not like this terminology introduced.
   ```suggestion
   public final class WitnessedVolumeStore extends DBDefinition.WithMap {
   ```



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeMetadataStore.java:
##
@@ -0,0 +1,88 @@
+package org.apache.hadoop.ozone.container.metadata;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more c

Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2484265616

   @adoroszlai I have addressed all the comments. Do you have any more review 
comments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847209966


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractStore.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.metadata;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.utils.db.BatchOperationHandler;
+import org.apache.hadoop.hdds.utils.db.DBStore;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**
+ * Abstract Interface for interacting with datanode databases.
+ */
+public interface AbstractStore extends Closeable {

Review Comment:
   We need a base interface for initialising a rocksdb in datanode alone, like 
loading the datanode profile and other things that comes allied with it. I 
believe there is a duplicate interface for OM & SCM as well. We should pick 
this up as a follow up task item and unify these things in one place.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847209966


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractStore.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.metadata;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.utils.db.BatchOperationHandler;
+import org.apache.hadoop.hdds.utils.db.DBStore;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**
+ * Abstract Interface for interacting with datanode databases.
+ */
+public interface AbstractStore extends Closeable {

Review Comment:
   We need a base interface for initialising a rocksdb in datanode alone, like 
loading the datanode profile and other things that comes allied with it. I 
believe there is a duplicate interface for OM & SCM as well. We should pick 
this up as a follow up task item.  



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java:
##
@@ -175,7 +175,8 @@ private boolean canIgnoreException(Result result) {
 case CONTAINER_UNHEALTHY:
 case CLOSED_CONTAINER_IO:
 case DELETE_ON_OPEN_CONTAINER:
-case UNSUPPORTED_REQUEST: // Blame client for sending unsupported request.
+case UNSUPPORTED_REQUEST:
+case CONTAINER_MISSING:// Blame client for sending unsupported request.

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847195130


##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/VoidCallable.java:
##
@@ -0,0 +1,26 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdds.utils;
+
+/**
+ * Defines a functional interface to call void returning function.
+ */
+@FunctionalInterface
+public interface VoidCallable {
+  void call() throws EXCEPTION_TYPE;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847194519


##
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java:
##
@@ -82,66 +84,74 @@ public class ClosedContainerReplicator extends 
BaseFreonGenerator implements
   private ContainerReplicator replicator;
 
   private Timer timer;
+  private ReferenceCountedHandle 
masterVolumeMetadataStoreReferenceCountedDB;
 
   private List replicationTasks;
 
   @Override
   public Void call() throws Exception {
 
-OzoneConfiguration conf = createOzoneConfiguration();
+try {
+  OzoneConfiguration conf = createOzoneConfiguration();
 
-final Collection datanodeStorageDirs =
-HddsServerUtil.getDatanodeStorageDirs(conf);
+  final Collection datanodeStorageDirs =
+  HddsServerUtil.getDatanodeStorageDirs(conf);

Review Comment:
   done



##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBTestUtils.java:
##
@@ -0,0 +1,142 @@
+package org.apache.hadoop.hdds.utils.db;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+import org.apache.hadoop.hdds.utils.MetadataKeyFilters;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Util class for mocking DB interactions happening in various tests.
+ */
+public final class DBTestUtils {
+
+  private DBTestUtils() {
+
+  }
+
+  public static  Table getInMemoryTableForTest() {
+return new Table() {
+  private final Map map = new ConcurrentHashMap<>();

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847163669


##
hadoop-ozone/dist/src/main/compose/compatibility/docker-config:
##
@@ -21,7 +21,7 @@ 
OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB
 OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
 OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
 OZONE-SITE.XML_ozone.scm.names=scm
-OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data
+OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata

Review Comment:
   We want the datanode.id file & the rocksdb to be present in the same volume. 
Keeping them in the same directory becomes logical in that case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847167460


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeDBDefinition.java:
##
@@ -0,0 +1,72 @@
+package org.apache.hadoop.ozone.container.metadata;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+import 
org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.utils.db.DBColumnFamilyDefinition;
+import org.apache.hadoop.hdds.utils.db.DBDefinition;
+import org.apache.hadoop.hdds.utils.db.LongCodec;
+import org.apache.hadoop.hdds.utils.db.Proto2EnumCodec;
+import org.apache.hadoop.ozone.OzoneConsts;
+
+import java.util.Map;
+
+/**
+ * Class for defining the schema for master volume in a datanode.

Review Comment:
   Master volume is defined in this doc:
   
https://docs.google.com/document/d/1k4Tc7P-Jszdwn4dooRQZiKctlcGWNlnjZO1fh_TBneM/edit?usp=sharing
 . It is the same volume where we have datanode.id file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


adoroszlai commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847142768


##
hadoop-ozone/dist/src/main/compose/compatibility/docker-config:
##
@@ -21,7 +21,7 @@ 
OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB
 OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
 OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
 OZONE-SITE.XML_ozone.scm.names=scm
-OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data
+OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata

Review Comment:
   So why do we need to create DB in the `ozone.scm.datanode.id.dir`, which is 
for the datanode ID file?  Can we use `ozone.metadata.dirs`, which is already 
configured to `/data/metadata`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1847119413


##
hadoop-ozone/dist/src/main/compose/compatibility/docker-config:
##
@@ -21,7 +21,7 @@ 
OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB
 OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
 OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
 OZONE-SITE.XML_ozone.scm.names=scm
-OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data
+OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata

Review Comment:
   Creation of rocksdb under /data fails because of permission issue. So had to 
create under a metadata directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


adoroszlai commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1846979936


##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/VoidCallable.java:
##
@@ -0,0 +1,26 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdds.utils;
+
+/**
+ * Defines a functional interface to call void returning function.
+ */
+@FunctionalInterface
+public interface VoidCallable {
+  void call() throws EXCEPTION_TYPE;

Review Comment:
   Please use `CheckedRunnable`.



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java:
##
@@ -175,7 +175,8 @@ private boolean canIgnoreException(Result result) {
 case CONTAINER_UNHEALTHY:
 case CLOSED_CONTAINER_IO:
 case DELETE_ON_OPEN_CONTAINER:
-case UNSUPPORTED_REQUEST: // Blame client for sending unsupported request.
+case UNSUPPORTED_REQUEST:
+case CONTAINER_MISSING:// Blame client for sending unsupported request.

Review Comment:
   Comment about `unsupported request` belongs to `UNSUPPORTED_REQUEST` case...



##
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java:
##
@@ -82,66 +84,74 @@ public class ClosedContainerReplicator extends 
BaseFreonGenerator implements
   private ContainerReplicator replicator;
 
   private Timer timer;
+  private ReferenceCountedHandle 
masterVolumeMetadataStoreReferenceCountedDB;
 
   private List replicationTasks;
 
   @Override
   public Void call() throws Exception {
 
-OzoneConfiguration conf = createOzoneConfiguration();
+try {
+  OzoneConfiguration conf = createOzoneConfiguration();
 
-final Collection datanodeStorageDirs =
-HddsServerUtil.getDatanodeStorageDirs(conf);
+  final Collection datanodeStorageDirs =
+  HddsServerUtil.getDatanodeStorageDirs(conf);

Review Comment:
   This change is just wrapping existing code in `try-finally`, with new code 
added only in `finally`.  You can reduce the change by:
   
   - rename existing `call()` to something else (using `oldCall()` here as 
example)
   - add new `call()` as:
   
   ```java
 public Void call() throws Exception {
   try {
 oldCall();
   } finally {
 ...
   }
 }
   ```
   
   This way indentation is unchanged, diff is reduced.



##
hadoop-ozone/dist/src/main/compose/compatibility/docker-config:
##
@@ -21,7 +21,7 @@ 
OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB
 OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
 OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
 OZONE-SITE.XML_ozone.scm.names=scm
-OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data
+OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata

Review Comment:
   Can you please explain this part of the change?



##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractStore.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.metadata;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.utils.db.BatchOperationHandler;
+import org.apache.hadoop.hdds.utils.db.DBStore;
+
+import java.io.Closeable;
+import java.io.IOException;
+
+/**

Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483724162

   > The latest change looks OK to me, but I want someone else to review it too.
   > 
   > In the future, please structure large changes like this as a series of 
smaller logical commits and push them to the PR unsquashed. It makes it so much 
easier to work with, and if someone disagrees with some part, easier to revert 
and change too.
   > 
   > Also, as we found, we halved the number of changed files with a few simple 
changes. Perhaps these changes were not perfect from an OO perspective or if 
you were doing this from scratch, but minimizing the wider impact of a change 
and reducing code change elsewhere is worth some small compromises on the code 
structure.
   
   Yeah I will keep this in mind for future patches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


sodonnel commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483689674

   The latest change looks OK to me, but I want someone else to review it too.
   
   In the future, please structure large changes like this as a series of 
smaller logical commits and push them to the PR unsquashed. It makes it so much 
easier to work with, and if someone disagrees with some part, easier to revert 
and change too.
   
   Also, as we found, we halved the number of changed files with a few simple 
changes. Perhaps these changes were not perfect from an OO perspective or if 
you were doing this from scratch, but minimizing the wider impact of a change 
and reducing code change elsewhere is worth some small compromises on the code 
structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483532007

   > > This is not an expensive operation. We anyways keep the entire 
containerMap in memory. There is no point in adding an overhead of rocksdb get, 
when we have the entire data in memory.
   > 
   > The point is - either use RocksDB, or don't use it - don't use half in 
memory and half on RocksDB. We are managing two structures (a map and a table), 
when one can do it.
   > 
   > You must read the entire small table from RocksDB on startup - so it will 
pull it into RockDB memory naturally. After that, writes will get buffered in 
RocksDB memory, so doing an existence check should be very fast. Removing the 
reliance on the missing containers map simplifies the code making it cleaner 
overall.
   > 
   > > This can be done but we would go back to the same problem of changing 
all the test files in that case.
   > 
   > Not if we keep the original add / remove for the common case and then 
rename the overloaded versions to better specify their intent.
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483530822

   > > > This is not an expensive operation. We anyways keep the entire 
containerMap in memory. There is no point in adding an overhead of rocksdb get, 
when we have the entire data in memory.
   > > 
   > > 
   > > The point is - either use RocksDB, or don't use it - don't use half in 
memory and half on RocksDB. We are managing two structures (a map and a table), 
when one can do it.
   > 
   > We are just using Rocksdb as a ledger and nothing more than that. I didn't 
want to change the existing data structure. All of these have have been working 
before when things are completely in memory. I am justing adding an operation 
to also add it to rocksdb and recreate the entire data structure on startup by 
doing so.
   > 
   > > You must read the entire small table from RocksDB on startup - so it 
will pull it into RockDB memory naturally. After that, writes will get buffered 
in RocksDB memory, so doing an existence check should be very fast. Removing 
the reliance on the missing containers map simplifies the code making it 
cleaner overall.
   > > > This can be done but we would go back to the same problem of changing 
all the test files in that case.
   > > 
   > > 
   > > Not if we keep the original add / remove for the common case and then 
rename the overloaded versions to better specify their intent.
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483305066

   > > This is not an expensive operation. We anyways keep the entire 
containerMap in memory. There is no point in adding an overhead of rocksdb get, 
when we have the entire data in memory.
   > 
   > The point is - either use RocksDB, or don't use it - don't use half in 
memory and half on RocksDB. We are managing two structures (a map and a table), 
when one can do it.
   
   We are just using Rocksdb as a ledger and nothing more than that. I didn't 
want to change the existing data structure. All of these have have been working 
before when things are completely in memory. I am justing adding an operation 
to also add it to rocksdb and recreate the entire data structure on startup by 
doing so.
   
   > You must read the entire small table from RocksDB on startup - so it will 
pull it into RockDB memory naturally. After that, writes will get buffered in 
RocksDB memory, so doing an existence check should be very fast. Removing the 
reliance on the missing containers map simplifies the code making it cleaner 
overall.
   > 
   > > This can be done but we would go back to the same problem of changing 
all the test files in that case.
   > 
   > Not if we keep the original add / remove for the common case and then 
rename the overloaded versions to better specify their intent.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


sodonnel commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483253600

   > This is not an expensive operation. We anyways keep the entire 
containerMap in memory. There is no point in adding an overhead of rocksdb get, 
when we have the entire data in memory.
   
   The point is - either use RocksDB, or don't use it - don't use half in 
memory and half on RocksDB. We are managing two structures (a map and a table), 
when one can do it.
   
   You must read the entire small table from RocksDB on startup - so it will 
pull it into RockDB memory naturally. After that, writes will get buffered in 
RocksDB memory, so doing an existence check should be very fast. Removing the 
reliance on the missing containers map simplifies the code making it cleaner 
overall.
   
   > This can be done but we would go back to the same problem of changing all 
the test files in that case.
   
   Not if we keep the original add / remove for the common case and then rename 
the overloaded versions to better specify their intent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483000946

   > Thanks for reducing the changed files. Its down to 47 now from 97, which 
is a big improvement.
   > 
   > I still think its confusing that we have two structures (missingContainers 
map and ContainerIDsTable) to track what is going on.
   > 
   > If we want to block creating a container that has been created before, can 
we just check for existence in the ContainerIDsTable? Right now we check the 
missingTable and then add an entry to the containerIDTable.
   MissingContainerSet is always in memory. We add it to the containerIdTable 
and remove the entry from the missingContainerSet.

   > If we remove a container, there are two boolean parameters - markMissing 
and removeFromDB - is it ever valid to pass "true, true" so it marks it missing 
and removes it from the DB?
   removeFromDB is only false when the physical container directory is not 
deleted from the node. So true, true should not be possible.
   **false, false** : This happens on finding a duplicate container on startup 
where the container info in memory gets swapped by removing and adding the 
updated container info.
   **true, false** : This happens on volume failure and we have to mark it as 
missing. This is a plain inmemory operation where in we remove the container 
from containerMap and add the containerId to the missingContainerSet.
   **false, true** : This is a regular delete flow which happens when SCM sends 
a delete container command. This will only happen after deleting the entire 
container data from the disk.
   **true, true** : This is an invalid case, this doesn't happen anywhere.
   
   > Provided we always add to the containerIDsTable on new containerCreation, 
then we should be able to check for previous creation from it too.
Since we have the containerInfo in the containerMap in memory already. 
There is no point in going to the rocksdb.
   
   > It feels like we can remove the container for a failed volume we don't 
touch the containerIDs table. Something that tries to recreate it will get 
blocked by the existence chance against the table.
   > 
   > Or, we remove it "safely" via replication, balancer and we remove it from 
the DB.
   On replication we always have the overwrite flag 
**overwriteMissingContainers** set to true.
   
   > If we just use the containerID table, then you don't need to iterate it at 
startup and checkoff all the containers. You also don't need to hold anything 
in memory as the RocksDB will cache things itself. However we may need to 
populate the table from all existing containers on disk.
   This is not an expensive operation. We anyways keep the entire containerMap 
in memory. There is no point in adding an overhead of rocksdb get, when we have 
the entire data in memory. 
   > 
   > I also suspect the code would be more understandable if we didn't overload 
add/removeContainer in ContainerSet with the booleans, and instead create a new 
method `add???` and `remove???` which are named more inline with what they are 
doing, eg remove, removeSafely, removeUnexpectedly, or something along those 
lines.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2482944743

   > Where does `continerIdsTable` get populated with the pre-existing list of 
containers on the datanode after the first restart when this code change is 
added? Maybe I have missed, but I cannot see where it happens. Does it need to 
get populated, so it contains the complete list of containers and hence is able 
to detect future failures?
   
   It happens as part of the ContainerReader which calls the addContainer in 
the containerSet
   
https://github.com/apache/ozone/blob/f22f0d172b552359f9cfbd60ddde3a4bdd66c8fd/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java#L216-L232
   
   After the containerReader run:
   we build the missingContainerSet by reading through the existing values in 
the DB
   
https://github.com/apache/ozone/blob/5b3d27aac581d9b79b13cf7d22f378cbf6950575/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java#L331-L352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


sodonnel commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2482965692

   Thanks for reducing the changed files. Its down to 47 now from 97, which is 
a big improvement.
   
   I still think its confusing that we have two structures (missingContainers 
map and ContainerIDsTable) to track what is going on.
   
   If we want to block creating a container that has been created before, can 
we just check for existence in the ContainerIDsTable? Right now we check the 
missingTable and then add an entry to the containerIDTable.
   
   If we remove a container, there are two boolean parameters - markMissing and 
removeFromDB - is it ever valid to pass "true, true" so it marks it missing and 
removes it from the DB?
   
   Provided we always add to the containerIDsTable on new containerCreation, 
then we should be able to check for previous creation from it too.
   
   It feels like we can remove the container for a failed volume we don't touch 
the containerIDs table. Something that tries to recreate it will get blocked by 
the existence chance against the table.
   
   Or, we remove it "safely" via replication, balancer and we remove it from 
the DB.
   
   If we just use the containerID table, then you don't need to iterate it at 
startup and checkoff all the containers. You also don't need to hold anything 
in memory as the RocksDB will cache things itself. However we may need to 
populate the table from all existing containers on disk.
   
   I also suspect the code would be more understandable if we didn't overload 
add/removeContainer in ContainerSet with the booleans, and instead create a new 
method `add???` and `remove???` which are named more inline with what they are 
doing, eg remove, removeSafely, removeUnexpectedly, or something along those 
lines.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-18 Thread via GitHub


sodonnel commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2482924973

   Where does `continerIdsTable` get populated with the pre-existing list of 
containers on the datanode after the first restart when this code change is 
added? Maybe I have missed, but I cannot see where it happens. Does it need to 
get populated, so it contains the complete list of containers and hence is able 
to detect future failures? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1845694473


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/DBHandle.java:
##
@@ -17,24 +17,24 @@
  */
 package org.apache.hadoop.ozone.container.common.interfaces;
 
-import org.apache.hadoop.ozone.container.metadata.DatanodeStore;
+import org.apache.hadoop.ozone.container.metadata.AbstractStore;
 
 import java.io.Closeable;
 
 /**
  * DB handle abstract class.
  */
-public abstract class DBHandle implements Closeable {
+public abstract class DBHandle implements 
Closeable {

Review Comment:
   yeah I have limited the changes to only the AbstractStore changes. I don't 
see a point in bringing down the changes further & I don't think there is going 
to be a lot of conflict while backporting the AbstractStore changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1845652019


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -65,10 +69,19 @@ public class ContainerSet implements Iterable> 
{
   new ConcurrentSkipListMap<>();
   private Clock clock;
   private long recoveringTimeout;
+  private final Table containerIdsTable;
+
+  public ContainerSet(Table continerIdsTable, long 
recoveringTimeout) {

Review Comment:
   If we kept the existing constructor and marked it a visibleForTest, then 
make it create a "test version" of this class, would it reduce all the test 
files that need to be changed?
   
   ```
   public ContainerSet(long recoveringTimeout) {
   this(ImMemoryTable, recoveringTimeout, true / false);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1845664489


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,46 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {
+if (missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d with state : %s is missing in" +
+  " the DN.", containerId, state),
+  ContainerProtos.Result.CONTAINER_MISSING);
+}
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws

Review Comment:
   There is already a method for that.
   
https://github.com/apache/ozone/blob/af0f757622a82843864bf5aaa93b2470a398f442/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L106-L108



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1845663513


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -65,10 +69,19 @@ public class ContainerSet implements Iterable> 
{
   new ConcurrentSkipListMap<>();
   private Clock clock;
   private long recoveringTimeout;
+  private final Table containerIdsTable;
+
+  public ContainerSet(Table continerIdsTable, long 
recoveringTimeout) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


sodonnel commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2481580569

   > All of the tests use these containerSet. If we pass a null value it would 
make the container set a read only data structure(I have put guardrails in 
place so that we don't end up inadvertently writing containers without 
persisting the data). Since we have made the table a must have arg we are 
having to pass a mocked table in all the existing tests.
   
   It is worth thinking about how these knock on effect could be minimised - I 
made one suggesting in a comment about keeping the existing construction and 
having it create the test object.
   
   > Another big change. We didn't have a pre-existing rocksdb on the master 
volume. DatanodeStore was only defined for data volumes, I wanted to reuse the 
code for creating a master volume that ended up adding generic everywhere, 
having created a base interface everywhere.
   
   This is refactoring things. At the very least, I'd like to see this done in 
a series of smaller commits on the PR - not all as one big commit. Its harder 
to follow the thought process and harder to review. However when a change like 
this causes such a large knock on effect, its worth considering if there is 
some other way we can avoid changing a lot of other code and still get some 
code reuse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1845652291


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/DBHandle.java:
##
@@ -17,24 +17,24 @@
  */
 package org.apache.hadoop.ozone.container.common.interfaces;
 
-import org.apache.hadoop.ozone.container.metadata.DatanodeStore;
+import org.apache.hadoop.ozone.container.metadata.AbstractStore;
 
 import java.io.Closeable;
 
 /**
  * DB handle abstract class.
  */
-public abstract class DBHandle implements Closeable {
+public abstract class DBHandle implements 
Closeable {

Review Comment:
   Can we avoid making changes to this class and get the desired functionality 
another way (extending another base class, something else)? Changing this has a 
knock on effect on many other classes and this feels like refactoring that 
perhaps isn't needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-17 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1845651827


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,46 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
+  public void validateContainerIsMissing(long containerId, State state) throws 
StorageContainerException {
+if (missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d with state : %s is missing in" +
+  " the DN.", containerId, state),
+  ContainerProtos.Result.CONTAINER_MISSING);
+}
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws

Review Comment:
   I would suggest adding a new method here - addContainerIgnoringMissing (or 
something similar), rather than changing the existing interface. That way, it 
avoids changes in existing calling code, and only code that explicitly needs to 
pass the boolean can be changed to call the new method. I suspect this change 
would reduce the number of files impacted by this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844646765


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d is missing in the DN " +

Review Comment:
   There is a function which doesn't overwrite containers. We should be 
performing this check on every container add. In KeyValueHandler we are having 
to do it there because the creation of KeyValueContainerData creates the 
required directory structure in the data volume. So we are having to provide 
this check. But whenever we are adding to data structure from other paths we 
have to ensure the container is not missing by default unless we explicity 
intend to overwrite the missing container.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844218841


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   I feel we should track all the containers getting added to the system. This 
flow also accomodates for replication both RATIS & EC. Barring the exception we 
give replication job to overwrite missing containers we should prevent 
overwriting any container coming through any flow. If we have this check in a 
central place, any future code changes would be foolproof. At least things 
would break and not silently end up writing something which is wrong.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844194989


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d is missing in the DN " +

Review Comment:
   KeyValueHandler is catching the exception and wrapping the exception as a 
response.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844206697


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d is missing in the DN " +

Review Comment:
   Yeah you are right. This is redundant. Actually let me shift this logic to a 
single function so that we don't have redundant code. But we should perform the 
missing container check on add container.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844194989


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d is missing in the DN " +

Review Comment:
   KeyValueHandler is catching the exception and wrapping the exception as an 
error response.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844189615


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {
+  throw new StorageContainerException(String.format("Container with 
container Id %d is missing in the DN " +

Review Comment:
   KeyValueHandler has this similar logic - why is it needed in both places? 
Feels like containerSet (ie here) may not need to check this, if its checked in 
the handler that creates the containers on the write path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1844138325


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   Prior to this change, the missingContainerSet variable in the ContainerSet 
was not used for anything within that class. Now, we are potentially adding to 
it from the containerList - what impact does that have more widely? What was 
the missingContainerSet used for before?
   
   We also have to open the new containerList table - it would be more clear if 
we just loaded that into memory and use the new "seenContainerList" for 
previously existing checks.
   
   However backing up a bit - how can containers get created?
   
   One way, is via the write path - that is the problem we are trying to solve.
   
   Another is via EC Reconstruction - there is already logic there to deal with 
a duplicate container I think, however it may not cover volume failure. However 
EC reconstruction effectively uses the same write path as normal writes, so it 
its basically the same as the write path.
   
   Then there is replication, which is a very different path into the system.
   
   Maybe this problem is better not solved in ContainerSet, but at the higher 
level in the dispatcher where the writes are being handled?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


swamirishi commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2479371440

   > For me, this change is too big, and its going to cause conflicts and be 
difficult to backport. Its also hard to review, as the number of changed files 
is massive, although many are test files and simple one line changes.
   > 
   > It feels like this change could be largely contained with the ContainerSet 
class, without changing a lot of generic classes and causing the ripple effect 
into the wider system. All it has to do, is create a new overloaded constructor 
and not change the existing one. Make the existing one call the new one with 
the correct runtime requirements. Then we just need to check if a container has 
previously existing or not, and add new containers to the "previously exists" 
list. This shouldn't require changing 90 something files in a single PR.
   
   All of the tests use these containerSet. If we pass a null value it would 
make the container set a read only data structure(I have put guardrails in 
place so that we don't end up inadvertently writing containers without 
persisting the data). Since we have made the table a must have arg we are 
having to pass a mocked table in all the existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-15 Thread via GitHub


sodonnel commented on PR #7402:
URL: https://github.com/apache/ozone/pull/7402#issuecomment-2479329553

   For me, this change is too big, and its going to cause conflicts and be 
difficult to backport. Its also hard to review, as the number of changed files 
is massive, although many are test files and simple one line changes.
   
   It feels like this change could be largely contained with the ContainerSet 
class, without changing a lot of generic classes and causing the ripple effect 
into the wider system. All it has to do, is create a new overloaded constructor 
and not change the existing one. Make the existing one call the new one with 
the correct runtime requirements. Then we just need to check if a container has 
previously existing or not, and add new containers to the "previously exists" 
list. This shouldn't require changing 90 something files in a single PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838698017


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   I am just reusing the data structure which was already there. This missing 
container set was just used for ratis container initially, I am now making it 
track even the EC containers. I didn't want to change a lot of code and reuse 
most of the stuff which is already there.
   Moreover an in memory get is faster than fetching from rocksdb



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838698017


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   I am just reusing the data structure which was already there. This missing 
container set was just used for ratis container initially, I am now making it 
track even the EC containers. I didn't want to change a lot of code and reuse 
most of the stuff which is already there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838669075


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   Ah ok, I mis-read the Java doc and thought it was a ratis snapshot. However 
my question still stands - why not simply use the ContainerIDTable to check for 
previous container creation, than augmenting this missing list?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838587332


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   The container set is first initialised with all the containers present on 
disk by performing an ls on the data volumes. Then we iterate through the 
containerIDs table on the rocksdb, and any container present in this table and 
not present on disk will be added to the missing container set. This would 
include both ratis and ec containers.
   Now on container creation we just have to check this in the in-memory set 
than going to the rocksdb everytime which would incur an io cost, however small 
it is not need in my opinion when we have everything in memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838587332


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   The container set is first initialised with all the containers present on 
disk by performing an ls on the data volumes. Then we iterate through the 
containerIDs table on the rocksdb, and any container present in this table and 
not present on disk will be added to the missing container set. This would 
include both ratis and ec containers.
   Now on container creation we just have to check this in memory set than 
going to the rocksdb everytime which would incur an io cost, however small it 
is not need in my opinion when we have everything in memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


swamirishi commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838587332


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   The container set is first initialised with all the containers present on 
disk by performing an ls on the data volumes. Then we iterate through the 
containerIDs table on the rocksdb, and any container present in this table and 
not present on disk will be added to the missing container set. This would 
include both ratis and ec containers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]

2024-11-12 Thread via GitHub


sodonnel commented on code in PR #7402:
URL: https://github.com/apache/ozone/pull/7402#discussion_r1838421679


##
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##
@@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) {
 this.recoveringTimeout = recoveringTimeout;
   }
 
+  public boolean addContainer(Container container) throws 
StorageContainerException {
+return addContainer(container, false);
+  }
+
   /**
* Add Container to container map.
* @param container container to be added
* @return If container is added to containerMap returns true, otherwise
* false
*/
-  public boolean addContainer(Container container) throws
+  public boolean addContainer(Container container, boolean 
overwriteMissingContainers) throws
   StorageContainerException {
 Preconditions.checkNotNull(container, "container cannot be null");
 
 long containerId = container.getContainerData().getContainerID();
+State containerState = container.getContainerData().getState();
+if (!overwriteMissingContainers && 
missingContainerSet.contains(containerId)) {

Review Comment:
   I don't understand why we cannot just check if the container is in the 
ContainerIDTable? Why do we need to use this missingContainerSet instead?
   
   I traced the creation of the missing set to a ratis snapshot (I know nothing 
about it) and a diff against the containers found in memory. What about EC 
containers? Do they all become part of this missing set?
   
   Prior to this change, the missingSet doesn't seem to be used for anything 
inside this class, but not it is used for these checks. What was missingSet 
used for before?
   
   I would have thought the ContainerID table starts initially empty after this 
change is committed. On first start, it gets populated with all containers in 
the ContainerSet (scanned from disk). After initial startup, or later startups, 
it can also have all containers added, but they should already all be there. 
Then we can just track previously created in the containerID table?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org