Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
adoroszlai commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1856969129 ## hadoop-ozone/dist/src/main/compose/compatibility/docker-config: ## @@ -21,7 +21,7 @@ OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1 OZONE-SITE.XML_ozone.scm.names=scm -OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data +OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata Review Comment: @swamirishi Turns out similar change is needed for kubernetes clusters, but the check was silently failing. Please see #7483. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850789106 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: I have created a follow up jira for performing this validation: https://issues.apache.org/jira/browse/HDDS-11761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850781332 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: in a follow up jira please accept write chunks only for EC pipelines if container is recovering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi merged PR #7402: URL: https://github.com/apache/ozone/pull/7402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2489270850 Thanks for the review on the patch @kerneltime @sodonnel @adoroszlai -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850756295 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: I am testing both creating an open container and recovering container. Agnostic of whether it is an EC key or a ratis key we land in the same dispatcher flow. So we don't have to test this separately https://github.com/apache/ozone/blob/827bc86915efbb398f024ccc467de5db7ecd347d/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java#L253-L256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850756295 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: I am testing both creating an open container and recovering container. Agnostic of whether it is an EC key or a ratis key we land in the same dispatcher flow. So we don't have to test this separately https://github.com/apache/ozone/blob/827bc86915efbb398f024ccc467de5db7ecd347d/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java#L253-L256 https://github.com/apache/ozone/blob/827bc86915efbb398f024ccc467de5db7ecd347d/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java#L292-L296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850756763 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + /** + * Add Container to container map. This would fail if the container is already present or has been marked as missing. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + /** + * Add Container to container map. This would overwrite the container even if it is missing. But would fail if the + * container is already present. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainerByOverwriteMissingContainer(Container container) throws StorageContainerException { +return addContainer(container, true); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { +if (missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d with state : %s is missing in" + + " the DN.", containerId, state), + ContainerProtos.Result.CONTAINER_MISSING); +} + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + private boolean addContainer(Container container, boolean overwriteMissingContainers) throws Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850697263 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: Recovering container accepting writes should be for EC only? Do we want write chunk coming for a non EC container to fail? In this test this is a single node cluster ``` cluster = MiniOzoneCluster.newBuilder(conf) .setNumDatanodes(1) .build(); cluster.waitForClusterToBeReady(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850407147 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2488725476 > The change seems reasonable now. Please ensure we are testing the APIs we care about for a container after the container goes missing. I think we can generate more test cases via fault injection. We can discuss if it needs to be part of this PR or a separate PR (to keep this PR small). Should we keep the scope of this PR limited? We can look into adding a follow up jira for performing fault injection tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1850409369 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: I have added a loop for testing creating open container. Then post restarting I am creating a recovering container and performing the same set operations expecting it to succeed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849704065 ## hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainer.java: ## @@ -158,6 +161,134 @@ public void testOzoneContainerViaDataNode() throws Exception { } } + @Test + public void testOzoneContainerWithMissingContainer() throws Exception { +MiniOzoneCluster cluster = null; +try { + long containerID = + ContainerTestHelper.getTestContainerID(); + OzoneConfiguration conf = newOzoneConfiguration(); + + // Start ozone container Via Datanode create. + cluster = MiniOzoneCluster.newBuilder(conf) + .setNumDatanodes(1) + .build(); + cluster.waitForClusterToBeReady(); + + runTestOzoneContainerWithMissingContainer(cluster, containerID); +} finally { + if (cluster != null) { +cluster.shutdown(); + } +} + } + + private void runTestOzoneContainerWithMissingContainer( + MiniOzoneCluster cluster, long testContainerID) throws Exception { +ContainerProtos.ContainerCommandRequestProto +request, writeChunkRequest, putBlockRequest, +updateRequest1, updateRequest2; +ContainerProtos.ContainerCommandResponseProto response, +updateResponse1, updateResponse2; +XceiverClientGrpc client = null; +try { + // This client talks to ozone container via datanode. + client = createClientForTesting(cluster); + client.connect(); + Pipeline pipeline = client.getPipeline(); + createContainerForTesting(client, testContainerID); + writeChunkRequest = writeChunkForContainer(client, testContainerID, + 1024); + + DatanodeDetails datanodeDetails = cluster.getHddsDatanodes().get(0).getDatanodeDetails(); + File containerPath = + new File(cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet().getContainer(testContainerID) + .getContainerData().getContainerPath()); + cluster.getHddsDatanode(datanodeDetails).stop(); + FileUtils.deleteDirectory(containerPath); + + // Restart & Check if the container has been marked as missing, since the container directory has been deleted. + cluster.restartHddsDatanode(datanodeDetails, false); + GenericTestUtils.waitFor(() -> { +try { + return cluster.getHddsDatanode(datanodeDetails).getDatanodeStateMachine() + .getContainer().getContainerSet() + .getMissingContainerSet().contains(testContainerID); +} catch (IOException e) { + return false; +} + }, 1000, 3); + + // Read Chunk Review Comment: Should we add a second write chunk as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2487744095 The change seems reasonable now. Please ensure we are testing the APIs we care about for a container after the container goes missing. I think we can generate more test cases via fault injection. We can discuss if it needs to be part of this PR or a separate PR (to keep this PR small). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849613595 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws * @return Container */ public Container getContainer(long containerId) { -Preconditions.checkState(containerId >= 0, -"Container Id cannot be negative."); +Preconditions.checkState(containerId >= 0, "Container Id cannot be negative."); return containerMap.get(containerId); } + /** + * Removes container from both memory and database. This should be used when the containerData on disk has been + * removed completely from the node. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, true); + } + + /** + * Removes containerId from memory. This needs to be used when the container is still present on disk, and the + * inmemory state of the container needs to be updated. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainerOnlyFromMemory(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, false); + } + + /** + * Marks a container to be missing, thus it removes the container from inmemory containerMap and marks the + * container as missing. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeMissingContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, true, false); + } + /** * Removes the Container matching with specified containerId. * @param containerId ID of the container to remove * @return If container is removed from containerMap returns true, otherwise * false */ - public boolean removeContainer(long containerId) { + private boolean removeContainer(long containerId, boolean markMissing, boolean removeFromDB) Review Comment: We don't have any use of that container information. Why keep it if it is not required? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849562938 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws * @return Container */ public Container getContainer(long containerId) { -Preconditions.checkState(containerId >= 0, -"Container Id cannot be negative."); +Preconditions.checkState(containerId >= 0, "Container Id cannot be negative."); return containerMap.get(containerId); } + /** + * Removes container from both memory and database. This should be used when the containerData on disk has been + * removed completely from the node. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, true); + } + + /** + * Removes containerId from memory. This needs to be used when the container is still present on disk, and the + * inmemory state of the container needs to be updated. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainerOnlyFromMemory(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, false); + } + + /** + * Marks a container to be missing, thus it removes the container from inmemory containerMap and marks the + * container as missing. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeMissingContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, true, false); + } + /** * Removes the Container matching with specified containerId. * @param containerId ID of the container to remove * @return If container is removed from containerMap returns true, otherwise * false */ - public boolean removeContainer(long containerId) { + private boolean removeContainer(long containerId, boolean markMissing, boolean removeFromDB) Review Comment: I was under the impression that once a container is created, it will remain the DB forever. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849423129 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws * @return Container */ public Container getContainer(long containerId) { -Preconditions.checkState(containerId >= 0, -"Container Id cannot be negative."); +Preconditions.checkState(containerId >= 0, "Container Id cannot be negative."); return containerMap.get(containerId); } + /** + * Removes container from both memory and database. This should be used when the containerData on disk has been + * removed completely from the node. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, true); + } + + /** + * Removes containerId from memory. This needs to be used when the container is still present on disk, and the + * inmemory state of the container needs to be updated. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainerOnlyFromMemory(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, false); + } + + /** + * Marks a container to be missing, thus it removes the container from inmemory containerMap and marks the + * container as missing. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeMissingContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, true, false); + } + /** * Removes the Container matching with specified containerId. * @param containerId ID of the container to remove * @return If container is removed from containerMap returns true, otherwise * false */ - public boolean removeContainer(long containerId) { + private boolean removeContainer(long containerId, boolean markMissing, boolean removeFromDB) Review Comment: **false, false** : This happens on finding a duplicate container on startup where the container info in memory gets swapped by removing and adding the updated container info. **true, false** : This happens on volume failure and we have to mark it as missing. This is a plain inmemory operation where in we remove the container from containerMap and add the containerId to the missingContainerSet. **false, true** : This is a regular delete flow which happens when SCM sends a delete container command. This will only happen after deleting the entire container data from the disk. **true, true** : This is an invalid case, this doesn't happen anywhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849374548 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + /** + * Add Container to container map. This would fail if the container is already present or has been marked as missing. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + /** + * Add Container to container map. This would overwrite the container even if it is missing. But would fail if the + * container is already present. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainerByOverwriteMissingContainer(Container container) throws StorageContainerException { +return addContainer(container, true); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { Review Comment: ```suggestion public void ensureContainerNotMissing(long containerId, State state) throws StorageContainerException { ``` ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + /** + * Add Container to container map. This would fail if the container is already present or has been marked as missing. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + /** + * Add Container to container map. This would overwrite the container even if it is missing. But would fail if the + * container is already present. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainerByOverwriteMissingContainer(Container container) throws StorageContainerException { +return addContainer(container, true); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { Review Comment: Since this method throws an exception `ensure` should better, ideally this method just returns a boolean that is check and the called throws exception. This is a NIT. ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + /** + * Add Container to container map. This would fail if the container is already present or has been marked as missing. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + /** + * Add Container to container map. This would overwrite the container even if it is missing. But would fail if the + * container is already present. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainerByOverwriteMissingContainer(Container container) throws StorageContainerException { +return addContainer(container, true); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { +if (missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d with state : %s is missing in" + + " the DN.", containerId, state), + ContainerProtos.Result.CONTAINER_MISSING); +} + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + private boolean addContainer(Container container, boolean overwriteMissingContainers) throws Review Comment: ```suggestion private boolean addContainer(Container container, boolean overwrite) throws ``` ## hadoop-hdds/cont
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849413986 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + /** + * Add Container to container map. This would fail if the container is already present or has been marked as missing. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + /** + * Add Container to container map. This would overwrite the container even if it is missing. But would fail if the + * container is already present. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainerByOverwriteMissingContainer(Container container) throws StorageContainerException { +return addContainer(container, true); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849414451 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +103,63 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + /** + * Add Container to container map. This would fail if the container is already present or has been marked as missing. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + /** + * Add Container to container map. This would overwrite the container even if it is missing. But would fail if the + * container is already present. + * @param container container to be added + * @return If container is added to containerMap returns true, otherwise + * false + */ + public boolean addContainerByOverwriteMissingContainer(Container container) throws StorageContainerException { +return addContainer(container, true); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { +if (missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d with state : %s is missing in" + + " the DN.", containerId, state), + ContainerProtos.Result.CONTAINER_MISSING); +} + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + private boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers) { + validateContainerIsMissing(containerId, containerState); +} Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849414760 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -122,21 +181,69 @@ public boolean addContainer(Container container) throws * @return Container */ public Container getContainer(long containerId) { -Preconditions.checkState(containerId >= 0, -"Container Id cannot be negative."); +Preconditions.checkState(containerId >= 0, "Container Id cannot be negative."); return containerMap.get(containerId); } + /** + * Removes container from both memory and database. This should be used when the containerData on disk has been + * removed completely from the node. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, true); + } + + /** + * Removes containerId from memory. This needs to be used when the container is still present on disk, and the + * inmemory state of the container needs to be updated. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeContainerOnlyFromMemory(long containerId) throws StorageContainerException { +return removeContainer(containerId, false, false); + } + + /** + * Marks a container to be missing, thus it removes the container from inmemory containerMap and marks the + * container as missing. + * @param containerId + * @return True if container is removed from containerMap. + * @throws StorageContainerException + */ + public boolean removeMissingContainer(long containerId) throws StorageContainerException { +return removeContainer(containerId, true, false); + } + /** * Removes the Container matching with specified containerId. * @param containerId ID of the container to remove * @return If container is removed from containerMap returns true, otherwise * false */ - public boolean removeContainer(long containerId) { + private boolean removeContainer(long containerId, boolean markMissing, boolean removeFromDB) Review Comment: `removeFromDB` do we expect it to be true outside of testing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1849376591 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java: ## @@ -354,6 +354,15 @@ ContainerCommandResponseProto handleCreateContainer( } long containerID = request.getContainerID(); +State containerState = request.getCreateContainer().getState(); + +if (containerState != RECOVERING) { + try { +containerSet.validateContainerIsMissing(containerID, containerState); + } catch (StorageContainerException ex) { +return ContainerUtils.logAndReturnError(LOG, ex, request); + } +} Review Comment: Had a discussion offline with @kerneltime . We have to perform this state check because EC container replication path follows the dispatcher flow. In order to allow replication manager to re create a missing container on the same node, we have to perform this check and override the check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2487026394 [MissingContainerSet](https://github.com/apache/ozone/blob/4e603aa93c8479349776c1597ef54504453cb512/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L62) in the containerSet was added as part of solving [HDDS-935](https://issues.apache.org/jira/browse/HDDS-935), which is trying to solve the same problem (**Avoid creating an already created container on a datanode in case of disk removal followed by datanode restart**) we are trying to tackle here. But unfortunately the solution only accounted for ratis container which was because it was only container type back then. The solution looks into adding all the containers that are present in the [ratis snapshot's containerBCSIdMap](https://github.com/apache/ozone/blob/2139367b52d68773f25d79abce0439323dde9671/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L313-L323) but were not found on disk after scanning through all the data volumes in the node. [HDDS-8448](https://issues.apache.org/jira/browse/HDDS-8448) caused a regression resulting removing the containers on a failed volume instead of marking it as missing. This patch is trying to do something along the same lines by extending the working for EC containers and is trying to fix the issues in the implementation for handling volume failures in the system. I believe this particular implementation was working fine for ratis container before HDDS-8448 caused a regression. Creating a duplicate data structure performing the same functionalities is just redundant and could lead to future bugs because we would have to update 2 structures in place of two. I also don't see any value in implementing something all over again which would also involve thorough testing of that flow. IMO we should be reusing the existing frameworks in the system to achieve something rather than add more flows complicating the already existing system further. If you still strongly feel we should move this logic out of the ContainerSet class then my opinion would be to get rid of the missingContainerSet entirely from all of our code flows and have only one of the 2 but not both. @sodonnel @kerneltime @adoroszlai what do you think we should do here? I have addressed the other review comments on the patch (Getting rid of the Proto2EnumCodec, Removing ReferenceCountedDB for the WitnessedContainerMetadataStore) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1848751487 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeDBDefinition.java: ## @@ -0,0 +1,72 @@ +package org.apache.hadoop.ozone.container.metadata; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State; +import org.apache.hadoop.hdds.scm.ScmConfigKeys; +import org.apache.hadoop.hdds.utils.db.DBColumnFamilyDefinition; +import org.apache.hadoop.hdds.utils.db.DBDefinition; +import org.apache.hadoop.hdds.utils.db.LongCodec; +import org.apache.hadoop.hdds.utils.db.Proto2EnumCodec; +import org.apache.hadoop.ozone.OzoneConsts; + +import java.util.Map; + +/** + * Class for defining the schema for master volume in a datanode. + */ +public final class MasterVolumeDBDefinition extends DBDefinition.WithMap { Review Comment: done ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeMetadataStore.java: ## @@ -0,0 +1,88 @@ +package org.apache.hadoop.ozone.container.metadata; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +import org.apache.hadoop.hdds.conf.ConfigurationSource; +import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos; +import org.apache.hadoop.hdds.utils.db.DBStore; +import org.apache.hadoop.hdds.utils.db.DBStoreBuilder; +import org.apache.hadoop.hdds.utils.db.Table; +import org.apache.hadoop.hdds.utils.db.managed.ManagedDBOptions; +import org.apache.hadoop.ozone.container.common.utils.BaseReferenceCountedDB; +import org.apache.hadoop.ozone.container.common.utils.ReferenceCountedHandle; + +import java.io.IOException; +import java.io.UncheckedIOException; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; + +/** + * Class for interacting with database in the master volume of a datanode. + */ +public final class MasterVolumeMetadataStore extends AbstractRDBStore Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1848750611 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java: ## @@ -378,7 +387,11 @@ ContainerCommandResponseProto handleCreateContainer( try { if (containerSet.getContainer(containerID) == null) { newContainer.create(volumeSet, volumeChoosingPolicy, clusterId); -created = containerSet.addContainer(newContainer); +if (RECOVERING == newContainer.getContainerState()) { + created = containerSet.addContainerByOverwriteMissingContainer(newContainer); +} else { + created = containerSet.addContainer(newContainer); +} Review Comment: If I rename the method name it would lead to change everywhere. Increasing the number of files again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1848704439 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java: ## @@ -276,7 +277,8 @@ private ContainerCommandResponseProto dispatchRequest( getMissingContainerSet().remove(containerID); } } -if (getMissingContainerSet().contains(containerID)) { +if (cmdType != Type.CreateContainer && !HddsUtils.isReadOnly(msg) Review Comment: We should allow writes to go through since we could be creating a RECOVERING container done by replication manager. As of date that is a problem. Once a container is marked missing we don't allow replication manager replicate the same container on the datanode. Consider we have only 5 nodes in a cluster and we lost one replica of an EC container because of volume failure. Now reconstruction will fail forever because the create container command fired would fail forever now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1848704439 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java: ## @@ -276,7 +277,8 @@ private ContainerCommandResponseProto dispatchRequest( getMissingContainerSet().remove(containerID); } } -if (getMissingContainerSet().contains(containerID)) { +if (cmdType != Type.CreateContainer && !HddsUtils.isReadOnly(msg) Review Comment: We should allow writes to go through since we could be creating a RECOVERING container done by replication manager. As of date that is a problem. Once a container is marked missing we don't allow replication manager replicate the same container on the datanode. Consider we have only 5 nodes and we lost one replica because of volume failure now reconstruction will fail forever because the create container command fired would fail forever now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2486155997 > Why is the logic embedded into hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java A cleaner approach would be expand via aggregation the container set logic. > > 1. Create a new Java class (WitnessedContainers.java) that reads and writes to the DB and keeps a concurrent copy in memory to handle concurrency (the current code does this) > 2. Do not bother with missing Container logic at the level of WitnessedContainers.java. When the replication manager tries to create a container, it checks if it is in the witnessed list and then adds it. So we only need two methods for addition `AddContainer` and `AddContainerIfNotPresent`. > Then what is the purpose of missing container set? We shouldn't have 2 data structures solving the same stuff. We should remove the missing container set and only rely on the witnessed container list. > Do we need reference counted code to be introduced? Can we get away with just embedding it within the ContainerSet? > I had to do the reference counted db implementation just because of the test cases. The PR would have gotten much bigger to fix the test case. A simpler approach was to use the reference counted, we can have only one instance of a writable db. > I think this code change can go through more than one round of simplification for what it is trying to solve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
kerneltime commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847754135 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java: ## @@ -276,7 +277,8 @@ private ContainerCommandResponseProto dispatchRequest( getMissingContainerSet().remove(containerID); } } -if (getMissingContainerSet().contains(containerID)) { +if (cmdType != Type.CreateContainer && !HddsUtils.isReadOnly(msg) Review Comment: Why do we need to expand this check? ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java: ## @@ -378,7 +387,11 @@ ContainerCommandResponseProto handleCreateContainer( try { if (containerSet.getContainer(containerID) == null) { newContainer.create(volumeSet, volumeChoosingPolicy, clusterId); -created = containerSet.addContainer(newContainer); +if (RECOVERING == newContainer.getContainerState()) { + created = containerSet.addContainerByOverwriteMissingContainer(newContainer); +} else { + created = containerSet.addContainer(newContainer); +} Review Comment: ```suggestion if (RECOVERING == newContainer.getContainerState()) { created = containerSet.addContainer(newContainer); } else { created = containerSet.addContainerIfNotPresent(newContainer); } ``` ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java: ## @@ -354,6 +354,15 @@ ContainerCommandResponseProto handleCreateContainer( } long containerID = request.getContainerID(); +State containerState = request.getCreateContainer().getState(); + +if (containerState != RECOVERING) { + try { +containerSet.validateContainerIsMissing(containerID, containerState); + } catch (StorageContainerException ex) { +return ContainerUtils.logAndReturnError(LOG, ex, request); + } +} Review Comment: This code reads poorly and complicates a straighforward logic. The witnessed container list should NOT care about the state. ```suggestion if (containerSet.wasContainerWitnessed(containerID) && containerState != RECOVERING) { LOG.warn("Trying to create a container already witnessed on this node which is not in recovering state); return ContainerUtils.logAndReturnError(LOG, null, request); } ``` ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeDBDefinition.java: ## @@ -0,0 +1,72 @@ +package org.apache.hadoop.ozone.container.metadata; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State; +import org.apache.hadoop.hdds.scm.ScmConfigKeys; +import org.apache.hadoop.hdds.utils.db.DBColumnFamilyDefinition; +import org.apache.hadoop.hdds.utils.db.DBDefinition; +import org.apache.hadoop.hdds.utils.db.LongCodec; +import org.apache.hadoop.hdds.utils.db.Proto2EnumCodec; +import org.apache.hadoop.ozone.OzoneConsts; + +import java.util.Map; + +/** + * Class for defining the schema for master volume in a datanode. + */ +public final class MasterVolumeDBDefinition extends DBDefinition.WithMap { Review Comment: This new vocabulary introduced in this PR will complicate the code and design literature for a long time. We don't have a `MasterVolume` in the design. We have locations (paths) where we store Ratis logs, or Datanode ID. These could all be one volume or any volume. Looking at the code, I do not like this terminology introduced. ```suggestion public final class WitnessedVolumeStore extends DBDefinition.WithMap { ``` ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeMetadataStore.java: ## @@ -0,0 +1,88 @@ +package org.apache.hadoop.ozone.container.metadata; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2484265616 @adoroszlai I have addressed all the comments. Do you have any more review comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847209966 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractStore.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements.  See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership.  The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License.  You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.ozone.container.metadata; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.hdds.conf.ConfigurationSource; +import org.apache.hadoop.hdds.utils.db.BatchOperationHandler; +import org.apache.hadoop.hdds.utils.db.DBStore; + +import java.io.Closeable; +import java.io.IOException; + +/** + * Abstract Interface for interacting with datanode databases. + */ +public interface AbstractStore extends Closeable { Review Comment: We need a base interface for initialising a rocksdb in datanode alone, like loading the datanode profile and other things that comes allied with it. I believe there is a duplicate interface for OM & SCM as well. We should pick this up as a follow up task item and unify these things in one place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847209966 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractStore.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements.  See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership.  The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License.  You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.ozone.container.metadata; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.hdds.conf.ConfigurationSource; +import org.apache.hadoop.hdds.utils.db.BatchOperationHandler; +import org.apache.hadoop.hdds.utils.db.DBStore; + +import java.io.Closeable; +import java.io.IOException; + +/** + * Abstract Interface for interacting with datanode databases. + */ +public interface AbstractStore extends Closeable { Review Comment: We need a base interface for initialising a rocksdb in datanode alone, like loading the datanode profile and other things that comes allied with it. I believe there is a duplicate interface for OM & SCM as well. We should pick this up as a follow up task item. ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java: ## @@ -175,7 +175,8 @@ private boolean canIgnoreException(Result result) { case CONTAINER_UNHEALTHY: case CLOSED_CONTAINER_IO: case DELETE_ON_OPEN_CONTAINER: -case UNSUPPORTED_REQUEST: // Blame client for sending unsupported request. +case UNSUPPORTED_REQUEST: +case CONTAINER_MISSING:// Blame client for sending unsupported request. Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847195130 ## hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/VoidCallable.java: ## @@ -0,0 +1,26 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdds.utils; + +/** + * Defines a functional interface to call void returning function. + */ +@FunctionalInterface +public interface VoidCallable { + void call() throws EXCEPTION_TYPE; Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847194519 ## hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java: ## @@ -82,66 +84,74 @@ public class ClosedContainerReplicator extends BaseFreonGenerator implements private ContainerReplicator replicator; private Timer timer; + private ReferenceCountedHandle masterVolumeMetadataStoreReferenceCountedDB; private List replicationTasks; @Override public Void call() throws Exception { -OzoneConfiguration conf = createOzoneConfiguration(); +try { + OzoneConfiguration conf = createOzoneConfiguration(); -final Collection datanodeStorageDirs = -HddsServerUtil.getDatanodeStorageDirs(conf); + final Collection datanodeStorageDirs = + HddsServerUtil.getDatanodeStorageDirs(conf); Review Comment: done ## hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBTestUtils.java: ## @@ -0,0 +1,142 @@ +package org.apache.hadoop.hdds.utils.db; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +import org.apache.hadoop.hdds.utils.MetadataKeyFilters; + +import java.io.File; +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; + +/** + * Util class for mocking DB interactions happening in various tests. + */ +public final class DBTestUtils { + + private DBTestUtils() { + + } + + public static Table getInMemoryTableForTest() { +return new Table() { + private final Map map = new ConcurrentHashMap<>(); Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847163669 ## hadoop-ozone/dist/src/main/compose/compatibility/docker-config: ## @@ -21,7 +21,7 @@ OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1 OZONE-SITE.XML_ozone.scm.names=scm -OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data +OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata Review Comment: We want the datanode.id file & the rocksdb to be present in the same volume. Keeping them in the same directory becomes logical in that case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847167460 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/MasterVolumeDBDefinition.java: ## @@ -0,0 +1,72 @@ +package org.apache.hadoop.ozone.container.metadata; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos.ContainerDataProto.State; +import org.apache.hadoop.hdds.scm.ScmConfigKeys; +import org.apache.hadoop.hdds.utils.db.DBColumnFamilyDefinition; +import org.apache.hadoop.hdds.utils.db.DBDefinition; +import org.apache.hadoop.hdds.utils.db.LongCodec; +import org.apache.hadoop.hdds.utils.db.Proto2EnumCodec; +import org.apache.hadoop.ozone.OzoneConsts; + +import java.util.Map; + +/** + * Class for defining the schema for master volume in a datanode. Review Comment: Master volume is defined in this doc: https://docs.google.com/document/d/1k4Tc7P-Jszdwn4dooRQZiKctlcGWNlnjZO1fh_TBneM/edit?usp=sharing . It is the same volume where we have datanode.id file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
adoroszlai commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847142768 ## hadoop-ozone/dist/src/main/compose/compatibility/docker-config: ## @@ -21,7 +21,7 @@ OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1 OZONE-SITE.XML_ozone.scm.names=scm -OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data +OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata Review Comment: So why do we need to create DB in the `ozone.scm.datanode.id.dir`, which is for the datanode ID file? Can we use `ozone.metadata.dirs`, which is already configured to `/data/metadata`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1847119413 ## hadoop-ozone/dist/src/main/compose/compatibility/docker-config: ## @@ -21,7 +21,7 @@ OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1 OZONE-SITE.XML_ozone.scm.names=scm -OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data +OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata Review Comment: Creation of rocksdb under /data fails because of permission issue. So had to create under a metadata directory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
adoroszlai commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1846979936 ## hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/VoidCallable.java: ## @@ -0,0 +1,26 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdds.utils; + +/** + * Defines a functional interface to call void returning function. + */ +@FunctionalInterface +public interface VoidCallable { + void call() throws EXCEPTION_TYPE; Review Comment: Please use `CheckedRunnable`. ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java: ## @@ -175,7 +175,8 @@ private boolean canIgnoreException(Result result) { case CONTAINER_UNHEALTHY: case CLOSED_CONTAINER_IO: case DELETE_ON_OPEN_CONTAINER: -case UNSUPPORTED_REQUEST: // Blame client for sending unsupported request. +case UNSUPPORTED_REQUEST: +case CONTAINER_MISSING:// Blame client for sending unsupported request. Review Comment: Comment about `unsupported request` belongs to `UNSUPPORTED_REQUEST` case... ## hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java: ## @@ -82,66 +84,74 @@ public class ClosedContainerReplicator extends BaseFreonGenerator implements private ContainerReplicator replicator; private Timer timer; + private ReferenceCountedHandle masterVolumeMetadataStoreReferenceCountedDB; private List replicationTasks; @Override public Void call() throws Exception { -OzoneConfiguration conf = createOzoneConfiguration(); +try { + OzoneConfiguration conf = createOzoneConfiguration(); -final Collection datanodeStorageDirs = -HddsServerUtil.getDatanodeStorageDirs(conf); + final Collection datanodeStorageDirs = + HddsServerUtil.getDatanodeStorageDirs(conf); Review Comment: This change is just wrapping existing code in `try-finally`, with new code added only in `finally`. You can reduce the change by: - rename existing `call()` to something else (using `oldCall()` here as example) - add new `call()` as: ```java public Void call() throws Exception { try { oldCall(); } finally { ... } } ``` This way indentation is unchanged, diff is reduced. ## hadoop-ozone/dist/src/main/compose/compatibility/docker-config: ## @@ -21,7 +21,7 @@ OZONE-SITE.XML_ozone.scm.datanode.ratis.volume.free-space.min=10MB OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1 OZONE-SITE.XML_ozone.scm.names=scm -OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data +OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data/metadata Review Comment: Can you please explain this part of the change? ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractStore.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements.  See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership.  The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License.  You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.ozone.container.metadata; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.hdds.conf.ConfigurationSource; +import org.apache.hadoop.hdds.utils.db.BatchOperationHandler; +import org.apache.hadoop.hdds.utils.db.DBStore; + +import java.io.Closeable; +import java.io.IOException; + +/**
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483724162 > The latest change looks OK to me, but I want someone else to review it too. > > In the future, please structure large changes like this as a series of smaller logical commits and push them to the PR unsquashed. It makes it so much easier to work with, and if someone disagrees with some part, easier to revert and change too. > > Also, as we found, we halved the number of changed files with a few simple changes. Perhaps these changes were not perfect from an OO perspective or if you were doing this from scratch, but minimizing the wider impact of a change and reducing code change elsewhere is worth some small compromises on the code structure. Yeah I will keep this in mind for future patches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483689674 The latest change looks OK to me, but I want someone else to review it too. In the future, please structure large changes like this as a series of smaller logical commits and push them to the PR unsquashed. It makes it so much easier to work with, and if someone disagrees with some part, easier to revert and change too. Also, as we found, we halved the number of changed files with a few simple changes. Perhaps these changes were not perfect from an OO perspective or if you were doing this from scratch, but minimizing the wider impact of a change and reducing code change elsewhere is worth some small compromises on the code structure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483532007 > > This is not an expensive operation. We anyways keep the entire containerMap in memory. There is no point in adding an overhead of rocksdb get, when we have the entire data in memory. > > The point is - either use RocksDB, or don't use it - don't use half in memory and half on RocksDB. We are managing two structures (a map and a table), when one can do it. > > You must read the entire small table from RocksDB on startup - so it will pull it into RockDB memory naturally. After that, writes will get buffered in RocksDB memory, so doing an existence check should be very fast. Removing the reliance on the missing containers map simplifies the code making it cleaner overall. > > > This can be done but we would go back to the same problem of changing all the test files in that case. > > Not if we keep the original add / remove for the common case and then rename the overloaded versions to better specify their intent. Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483530822 > > > This is not an expensive operation. We anyways keep the entire containerMap in memory. There is no point in adding an overhead of rocksdb get, when we have the entire data in memory. > > > > > > The point is - either use RocksDB, or don't use it - don't use half in memory and half on RocksDB. We are managing two structures (a map and a table), when one can do it. > > We are just using Rocksdb as a ledger and nothing more than that. I didn't want to change the existing data structure. All of these have have been working before when things are completely in memory. I am justing adding an operation to also add it to rocksdb and recreate the entire data structure on startup by doing so. > > > You must read the entire small table from RocksDB on startup - so it will pull it into RockDB memory naturally. After that, writes will get buffered in RocksDB memory, so doing an existence check should be very fast. Removing the reliance on the missing containers map simplifies the code making it cleaner overall. > > > This can be done but we would go back to the same problem of changing all the test files in that case. > > > > > > Not if we keep the original add / remove for the common case and then rename the overloaded versions to better specify their intent. Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483305066 > > This is not an expensive operation. We anyways keep the entire containerMap in memory. There is no point in adding an overhead of rocksdb get, when we have the entire data in memory. > > The point is - either use RocksDB, or don't use it - don't use half in memory and half on RocksDB. We are managing two structures (a map and a table), when one can do it. We are just using Rocksdb as a ledger and nothing more than that. I didn't want to change the existing data structure. All of these have have been working before when things are completely in memory. I am justing adding an operation to also add it to rocksdb and recreate the entire data structure on startup by doing so. > You must read the entire small table from RocksDB on startup - so it will pull it into RockDB memory naturally. After that, writes will get buffered in RocksDB memory, so doing an existence check should be very fast. Removing the reliance on the missing containers map simplifies the code making it cleaner overall. > > > This can be done but we would go back to the same problem of changing all the test files in that case. > > Not if we keep the original add / remove for the common case and then rename the overloaded versions to better specify their intent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483253600 > This is not an expensive operation. We anyways keep the entire containerMap in memory. There is no point in adding an overhead of rocksdb get, when we have the entire data in memory. The point is - either use RocksDB, or don't use it - don't use half in memory and half on RocksDB. We are managing two structures (a map and a table), when one can do it. You must read the entire small table from RocksDB on startup - so it will pull it into RockDB memory naturally. After that, writes will get buffered in RocksDB memory, so doing an existence check should be very fast. Removing the reliance on the missing containers map simplifies the code making it cleaner overall. > This can be done but we would go back to the same problem of changing all the test files in that case. Not if we keep the original add / remove for the common case and then rename the overloaded versions to better specify their intent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2483000946 > Thanks for reducing the changed files. Its down to 47 now from 97, which is a big improvement. > > I still think its confusing that we have two structures (missingContainers map and ContainerIDsTable) to track what is going on. > > If we want to block creating a container that has been created before, can we just check for existence in the ContainerIDsTable? Right now we check the missingTable and then add an entry to the containerIDTable. MissingContainerSet is always in memory. We add it to the containerIdTable and remove the entry from the missingContainerSet. > If we remove a container, there are two boolean parameters - markMissing and removeFromDB - is it ever valid to pass "true, true" so it marks it missing and removes it from the DB? removeFromDB is only false when the physical container directory is not deleted from the node. So true, true should not be possible. **false, false** : This happens on finding a duplicate container on startup where the container info in memory gets swapped by removing and adding the updated container info. **true, false** : This happens on volume failure and we have to mark it as missing. This is a plain inmemory operation where in we remove the container from containerMap and add the containerId to the missingContainerSet. **false, true** : This is a regular delete flow which happens when SCM sends a delete container command. This will only happen after deleting the entire container data from the disk. **true, true** : This is an invalid case, this doesn't happen anywhere. > Provided we always add to the containerIDsTable on new containerCreation, then we should be able to check for previous creation from it too. Since we have the containerInfo in the containerMap in memory already. There is no point in going to the rocksdb. > It feels like we can remove the container for a failed volume we don't touch the containerIDs table. Something that tries to recreate it will get blocked by the existence chance against the table. > > Or, we remove it "safely" via replication, balancer and we remove it from the DB. On replication we always have the overwrite flag **overwriteMissingContainers** set to true. > If we just use the containerID table, then you don't need to iterate it at startup and checkoff all the containers. You also don't need to hold anything in memory as the RocksDB will cache things itself. However we may need to populate the table from all existing containers on disk. This is not an expensive operation. We anyways keep the entire containerMap in memory. There is no point in adding an overhead of rocksdb get, when we have the entire data in memory. > > I also suspect the code would be more understandable if we didn't overload add/removeContainer in ContainerSet with the booleans, and instead create a new method `add???` and `remove???` which are named more inline with what they are doing, eg remove, removeSafely, removeUnexpectedly, or something along those lines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2482944743 > Where does `continerIdsTable` get populated with the pre-existing list of containers on the datanode after the first restart when this code change is added? Maybe I have missed, but I cannot see where it happens. Does it need to get populated, so it contains the complete list of containers and hence is able to detect future failures? It happens as part of the ContainerReader which calls the addContainer in the containerSet https://github.com/apache/ozone/blob/f22f0d172b552359f9cfbd60ddde3a4bdd66c8fd/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerReader.java#L216-L232 After the containerReader run: we build the missingContainerSet by reading through the existing values in the DB https://github.com/apache/ozone/blob/5b3d27aac581d9b79b13cf7d22f378cbf6950575/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java#L331-L352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2482965692 Thanks for reducing the changed files. Its down to 47 now from 97, which is a big improvement. I still think its confusing that we have two structures (missingContainers map and ContainerIDsTable) to track what is going on. If we want to block creating a container that has been created before, can we just check for existence in the ContainerIDsTable? Right now we check the missingTable and then add an entry to the containerIDTable. If we remove a container, there are two boolean parameters - markMissing and removeFromDB - is it ever valid to pass "true, true" so it marks it missing and removes it from the DB? Provided we always add to the containerIDsTable on new containerCreation, then we should be able to check for previous creation from it too. It feels like we can remove the container for a failed volume we don't touch the containerIDs table. Something that tries to recreate it will get blocked by the existence chance against the table. Or, we remove it "safely" via replication, balancer and we remove it from the DB. If we just use the containerID table, then you don't need to iterate it at startup and checkoff all the containers. You also don't need to hold anything in memory as the RocksDB will cache things itself. However we may need to populate the table from all existing containers on disk. I also suspect the code would be more understandable if we didn't overload add/removeContainer in ContainerSet with the booleans, and instead create a new method `add???` and `remove???` which are named more inline with what they are doing, eg remove, removeSafely, removeUnexpectedly, or something along those lines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2482924973 Where does `continerIdsTable` get populated with the pre-existing list of containers on the datanode after the first restart when this code change is added? Maybe I have missed, but I cannot see where it happens. Does it need to get populated, so it contains the complete list of containers and hence is able to detect future failures? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1845694473 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/DBHandle.java: ## @@ -17,24 +17,24 @@ */ package org.apache.hadoop.ozone.container.common.interfaces; -import org.apache.hadoop.ozone.container.metadata.DatanodeStore; +import org.apache.hadoop.ozone.container.metadata.AbstractStore; import java.io.Closeable; /** * DB handle abstract class. */ -public abstract class DBHandle implements Closeable { +public abstract class DBHandle implements Closeable { Review Comment: yeah I have limited the changes to only the AbstractStore changes. I don't see a point in bringing down the changes further & I don't think there is going to be a lot of conflict while backporting the AbstractStore changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1845652019 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -65,10 +69,19 @@ public class ContainerSet implements Iterable> { new ConcurrentSkipListMap<>(); private Clock clock; private long recoveringTimeout; + private final Table containerIdsTable; + + public ContainerSet(Table continerIdsTable, long recoveringTimeout) { Review Comment: If we kept the existing constructor and marked it a visibleForTest, then make it create a "test version" of this class, would it reduce all the test files that need to be changed? ``` public ContainerSet(long recoveringTimeout) { this(ImMemoryTable, recoveringTimeout, true / false); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1845664489 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,46 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { +if (missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d with state : %s is missing in" + + " the DN.", containerId, state), + ContainerProtos.Result.CONTAINER_MISSING); +} + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws Review Comment: There is already a method for that. https://github.com/apache/ozone/blob/af0f757622a82843864bf5aaa93b2470a398f442/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L106-L108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1845663513 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -65,10 +69,19 @@ public class ContainerSet implements Iterable> { new ConcurrentSkipListMap<>(); private Clock clock; private long recoveringTimeout; + private final Table containerIdsTable; + + public ContainerSet(Table continerIdsTable, long recoveringTimeout) { Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2481580569 > All of the tests use these containerSet. If we pass a null value it would make the container set a read only data structure(I have put guardrails in place so that we don't end up inadvertently writing containers without persisting the data). Since we have made the table a must have arg we are having to pass a mocked table in all the existing tests. It is worth thinking about how these knock on effect could be minimised - I made one suggesting in a comment about keeping the existing construction and having it create the test object. > Another big change. We didn't have a pre-existing rocksdb on the master volume. DatanodeStore was only defined for data volumes, I wanted to reuse the code for creating a master volume that ended up adding generic everywhere, having created a base interface everywhere. This is refactoring things. At the very least, I'd like to see this done in a series of smaller commits on the PR - not all as one big commit. Its harder to follow the thought process and harder to review. However when a change like this causes such a large knock on effect, its worth considering if there is some other way we can avoid changing a lot of other code and still get some code reuse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1845652291 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/DBHandle.java: ## @@ -17,24 +17,24 @@ */ package org.apache.hadoop.ozone.container.common.interfaces; -import org.apache.hadoop.ozone.container.metadata.DatanodeStore; +import org.apache.hadoop.ozone.container.metadata.AbstractStore; import java.io.Closeable; /** * DB handle abstract class. */ -public abstract class DBHandle implements Closeable { +public abstract class DBHandle implements Closeable { Review Comment: Can we avoid making changes to this class and get the desired functionality another way (extending another base class, something else)? Changing this has a knock on effect on many other classes and this feels like refactoring that perhaps isn't needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1845651827 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,46 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + + public void validateContainerIsMissing(long containerId, State state) throws StorageContainerException { +if (missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d with state : %s is missing in" + + " the DN.", containerId, state), + ContainerProtos.Result.CONTAINER_MISSING); +} + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws Review Comment: I would suggest adding a new method here - addContainerIgnoringMissing (or something similar), rather than changing the existing interface. That way, it avoids changes in existing calling code, and only code that explicitly needs to pass the boolean can be changed to call the new method. I suspect this change would reduce the number of files impacted by this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844646765 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d is missing in the DN " + Review Comment: There is a function which doesn't overwrite containers. We should be performing this check on every container add. In KeyValueHandler we are having to do it there because the creation of KeyValueContainerData creates the required directory structure in the data volume. So we are having to provide this check. But whenever we are adding to data structure from other paths we have to ensure the container is not missing by default unless we explicity intend to overwrite the missing container. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844218841 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: I feel we should track all the containers getting added to the system. This flow also accomodates for replication both RATIS & EC. Barring the exception we give replication job to overwrite missing containers we should prevent overwriting any container coming through any flow. If we have this check in a central place, any future code changes would be foolproof. At least things would break and not silently end up writing something which is wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844194989 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d is missing in the DN " + Review Comment: KeyValueHandler is catching the exception and wrapping the exception as a response. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844206697 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d is missing in the DN " + Review Comment: Yeah you are right. This is redundant. Actually let me shift this logic to a single function so that we don't have redundant code. But we should perform the missing container check on add container. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844194989 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d is missing in the DN " + Review Comment: KeyValueHandler is catching the exception and wrapping the exception as an error response. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844189615 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { + throw new StorageContainerException(String.format("Container with container Id %d is missing in the DN " + Review Comment: KeyValueHandler has this similar logic - why is it needed in both places? Feels like containerSet (ie here) may not need to check this, if its checked in the handler that creates the containers on the write path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1844138325 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: Prior to this change, the missingContainerSet variable in the ContainerSet was not used for anything within that class. Now, we are potentially adding to it from the containerList - what impact does that have more widely? What was the missingContainerSet used for before? We also have to open the new containerList table - it would be more clear if we just loaded that into memory and use the new "seenContainerList" for previously existing checks. However backing up a bit - how can containers get created? One way, is via the write path - that is the problem we are trying to solve. Another is via EC Reconstruction - there is already logic there to deal with a duplicate container I think, however it may not cover volume failure. However EC reconstruction effectively uses the same write path as normal writes, so it its basically the same as the write path. Then there is replication, which is a very different path into the system. Maybe this problem is better not solved in ContainerSet, but at the higher level in the dispatcher where the writes are being handled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2479371440 > For me, this change is too big, and its going to cause conflicts and be difficult to backport. Its also hard to review, as the number of changed files is massive, although many are test files and simple one line changes. > > It feels like this change could be largely contained with the ContainerSet class, without changing a lot of generic classes and causing the ripple effect into the wider system. All it has to do, is create a new overloaded constructor and not change the existing one. Make the existing one call the new one with the correct runtime requirements. Then we just need to check if a container has previously existing or not, and add new containers to the "previously exists" list. This shouldn't require changing 90 something files in a single PR. All of the tests use these containerSet. If we pass a null value it would make the container set a read only data structure(I have put guardrails in place so that we don't end up inadvertently writing containers without persisting the data). Since we have made the table a must have arg we are having to pass a mocked table in all the existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2479329553 For me, this change is too big, and its going to cause conflicts and be difficult to backport. Its also hard to review, as the number of changed files is massive, although many are test files and simple one line changes. It feels like this change could be largely contained with the ContainerSet class, without changing a lot of generic classes and causing the ripple effect into the wider system. All it has to do, is create a new overloaded constructor and not change the existing one. Make the existing one call the new one with the correct runtime requirements. Then we just need to check if a container has previously existing or not, and add new containers to the "previously exists" list. This shouldn't require changing 90 something files in a single PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838698017 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: I am just reusing the data structure which was already there. This missing container set was just used for ratis container initially, I am now making it track even the EC containers. I didn't want to change a lot of code and reuse most of the stuff which is already there. Moreover an in memory get is faster than fetching from rocksdb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838698017 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: I am just reusing the data structure which was already there. This missing container set was just used for ratis container initially, I am now making it track even the EC containers. I didn't want to change a lot of code and reuse most of the stuff which is already there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838669075 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: Ah ok, I mis-read the Java doc and thought it was a ratis snapshot. However my question still stands - why not simply use the ContainerIDTable to check for previous container creation, than augmenting this missing list? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838587332 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: The container set is first initialised with all the containers present on disk by performing an ls on the data volumes. Then we iterate through the containerIDs table on the rocksdb, and any container present in this table and not present on disk will be added to the missing container set. This would include both ratis and ec containers. Now on container creation we just have to check this in the in-memory set than going to the rocksdb everytime which would incur an io cost, however small it is not need in my opinion when we have everything in memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838587332 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: The container set is first initialised with all the containers present on disk by performing an ls on the data volumes. Then we iterate through the containerIDs table on the rocksdb, and any container present in this table and not present on disk will be added to the missing container set. This would include both ratis and ec containers. Now on container creation we just have to check this in memory set than going to the rocksdb everytime which would incur an io cost, however small it is not need in my opinion when we have everything in memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
swamirishi commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838587332 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: The container set is first initialised with all the containers present on disk by performing an ls on the data volumes. Then we iterate through the containerIDs table on the rocksdb, and any container present in this table and not present on disk will be added to the missing container set. This would include both ratis and ec containers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org
Re: [PR] HDDS-11650. ContainerId list to track all containers created in a datanode [ozone]
sodonnel commented on code in PR #7402: URL: https://github.com/apache/ozone/pull/7402#discussion_r1838421679 ## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java: ## @@ -85,22 +98,41 @@ public void setRecoveringTimeout(long recoveringTimeout) { this.recoveringTimeout = recoveringTimeout; } + public boolean addContainer(Container container) throws StorageContainerException { +return addContainer(container, false); + } + /** * Add Container to container map. * @param container container to be added * @return If container is added to containerMap returns true, otherwise * false */ - public boolean addContainer(Container container) throws + public boolean addContainer(Container container, boolean overwriteMissingContainers) throws StorageContainerException { Preconditions.checkNotNull(container, "container cannot be null"); long containerId = container.getContainerData().getContainerID(); +State containerState = container.getContainerData().getState(); +if (!overwriteMissingContainers && missingContainerSet.contains(containerId)) { Review Comment: I don't understand why we cannot just check if the container is in the ContainerIDTable? Why do we need to use this missingContainerSet instead? I traced the creation of the missing set to a ratis snapshot (I know nothing about it) and a diff against the containers found in memory. What about EC containers? Do they all become part of this missing set? Prior to this change, the missingSet doesn't seem to be used for anything inside this class, but not it is used for these checks. What was missingSet used for before? I would have thought the ContainerID table starts initially empty after this change is committed. On first start, it gets populated with all containers in the ContainerSet (scanned from disk). After initial startup, or later startups, it can also have all containers added, but they should already all be there. Then we can just track previously created in the containerID table? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org