avijayanhwx commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r462320563
##########
File path:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
##########
@@ -325,8 +325,12 @@ public void addContainerInfo(long containerID,
Pipeline pipeline) throws IOException {
Preconditions.checkNotNull(containerInfo);
containers.addContainer(containerInfo);
- pipelineManager.addContainerToPipeline(pipeline.getId(),
- ContainerID.valueof(containerID));
+ if (pipeline != null) {
Review comment:
@xiaoyuyao This is a specific use case to Recon where it may be directly
trying to add a new container to its "passive SCM" metadata. For example, Recon
may have been down during an extended period of time, when a container was
created, opened and then eventually closed by SCM. At that point in time, if
Recon is started up, it receives container reports for that container, it
verifies that it was created by SCM, along with the pipeline information. If it
is a closed container, then the pipeline ID returned by SCM is spurious. All
that matters is the datanode which hosts the replica.
In SCM this is not possible since it already has the container metadata, and
the 'addContainerInfo' method is called during "createContainer" step.
##########
File path:
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -105,18 +130,32 @@ public void addNewContainer(long containerId,
ContainerInfo containerInfo = containerWithPipeline.getContainerInfo();
getLock().lock();
try {
- if (getPipelineManager().containsPipeline(
- containerWithPipeline.getPipeline().getId())) {
- getContainerStateManager().addContainerInfo(containerId, containerInfo,
- getPipelineManager(), containerWithPipeline.getPipeline());
+ boolean success = false;
+ if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)) {
+ PipelineID pipelineID = containerWithPipeline.getPipeline().getId();
+ if (getPipelineManager().containsPipeline(pipelineID)) {
+ getContainerStateManager().addContainerInfo(containerId,
+ containerInfo, getPipelineManager(),
+ containerWithPipeline.getPipeline());
+ success = true;
+ } else {
+ // Get open container for a pipeline that Recon does not know
+ // about yet. Cannot update internal state until pipeline is synced.
+ LOG.warn(String.format(
Review comment:
Sure, will do.
##########
File path:
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
* @throws IOException on Error.
*/
public void checkAndAddNewContainer(ContainerID containerID,
+ ContainerReplicaProto.State replicaState,
DatanodeDetails datanodeDetails)
throws IOException {
if (!exists(containerID)) {
LOG.info("New container {} got from {}.", containerID,
datanodeDetails.getHostName());
ContainerWithPipeline containerWithPipeline =
scmClient.getContainerWithPipeline(containerID.getId());
- LOG.debug("Verified new container from SCM {} ",
- containerWithPipeline.getContainerInfo().containerID());
+ LOG.info("Verified new container from SCM {}, part of {} ",
+ containerID, containerWithPipeline.getPipeline().getId());
// If no other client added this, go ahead and add this container.
if (!exists(containerID)) {
addNewContainer(containerID.getId(), containerWithPipeline);
}
+ } else {
+ // Check if container state is not open. In SCM, container state
+ // changes to CLOSING first, and then the close command is pushed down
+ // to Datanodes. Recon 'learns' this from DN, and hence replica state
+ // will move container state to 'CLOSING'.
+ ContainerInfo containerInfo = getContainer(containerID);
+ if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
+ && !replicaState.equals(ContainerReplicaProto.State.OPEN)
Review comment:
@xiaoyuyao The container could be in any non-open & healthy state
(CLOSING, QUASI_CLOSED, CLOSED). When the container was OPEN Recon may have
added it to its metadata, but then it may have gone down. When it comes back
up, the container could be in any state. A unit test
(testProcessICRStateMismatch ) has been added for all the states.
##########
File path:
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
* @throws IOException on Error.
*/
public void checkAndAddNewContainer(ContainerID containerID,
+ ContainerReplicaProto.State replicaState,
DatanodeDetails datanodeDetails)
throws IOException {
if (!exists(containerID)) {
LOG.info("New container {} got from {}.", containerID,
datanodeDetails.getHostName());
ContainerWithPipeline containerWithPipeline =
scmClient.getContainerWithPipeline(containerID.getId());
- LOG.debug("Verified new container from SCM {} ",
- containerWithPipeline.getContainerInfo().containerID());
+ LOG.info("Verified new container from SCM {}, part of {} ",
+ containerID, containerWithPipeline.getPipeline().getId());
// If no other client added this, go ahead and add this container.
if (!exists(containerID)) {
addNewContainer(containerID.getId(), containerWithPipeline);
}
+ } else {
+ // Check if container state is not open. In SCM, container state
+ // changes to CLOSING first, and then the close command is pushed down
+ // to Datanodes. Recon 'learns' this from DN, and hence replica state
+ // will move container state to 'CLOSING'.
+ ContainerInfo containerInfo = getContainer(containerID);
+ if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
+ && !replicaState.equals(ContainerReplicaProto.State.OPEN)
Review comment:
A sample flow of events.
- Recon has containerState = OPEN, Recon gets container report with
replicaState = QUASI_CLOSED
- Before processing the replica, it calls on the above logic
(ReconContainerManager#checkAndAddNewContainer) which updates containerState to
‘CLOSING’ through the "FINALIZE" event. This is like a pre-processing step for
Recon where the container replica state is showing up as unexpected. We have to
do this since SCM is where the state transition was initiated, but Recon has to
"learn" it later from DNs.
- Further along in the same process container report flow, there is a
process replica step which sees that the containerState = ‘CLOSING’, and
replicaState = QUASI_CLOSED, hence moves container state to ‘QUASI_CLOSED’
(AbstractContainerReportHandler#updateContainerState). This is the same code as
used by SCM to transition container state by looking at the replica state.
The same sequence of events would be true if starting replicaState =
“CLOSING" or "CLOSED” as well
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]