[ 
https://issues.apache.org/jira/browse/HDDS-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-3270:
------------------------------------
    Description: 
As part of the SCM safemode process, there are some rules which must pass 
before safemode can be exited.

One of these rules is the number of registered datanodes and another is that at 
least one pipeline must be created and open.

Currently, pipeline creation is attempted each time a node registers. As soon 
as the 3rd node registers, pipelines will be created.

There are two issue with this:

1. With network topology, if the first 3 nodes are all from the same rack, a 
non-rackaware pipeline will get created.

2. With multi-raft, it would be better if more nodes are registered to allow 
the multiple pipelines per node to be spread across all the available nodes.

The proposal here, is to introduce a new sub-state into the safemode process, 
call "preCheckComplete". When adding rules to the Safemode Manager, some rules 
can be tagged as "preCheck" (eg number of datanodes registered). When all all 
the pre-check rules have passed a notification will be sent to all safemode 
listeners:

{code}
  safeModeIsOn -> true
  preCheckComplete -> true
{code}

That will allow the listener to take action on this first stage completing. In 
the case of PipelineManager, it will then allow pipelines to be created.

After the remaining rules have been passed, safemode will exit as normal, by 
sending a second event:

{code}
  safeModeIsOn -> false
  preCheckComplete -> true
{code}

> Allow safemode listeners to be notified when some precheck rules pass
> ---------------------------------------------------------------------
>
>                 Key: HDDS-3270
>                 URL: https://issues.apache.org/jira/browse/HDDS-3270
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 0.6.0
>            Reporter: Stephen O'Donnell
>            Priority: Major
>
> As part of the SCM safemode process, there are some rules which must pass 
> before safemode can be exited.
> One of these rules is the number of registered datanodes and another is that 
> at least one pipeline must be created and open.
> Currently, pipeline creation is attempted each time a node registers. As soon 
> as the 3rd node registers, pipelines will be created.
> There are two issue with this:
> 1. With network topology, if the first 3 nodes are all from the same rack, a 
> non-rackaware pipeline will get created.
> 2. With multi-raft, it would be better if more nodes are registered to allow 
> the multiple pipelines per node to be spread across all the available nodes.
> The proposal here, is to introduce a new sub-state into the safemode process, 
> call "preCheckComplete". When adding rules to the Safemode Manager, some 
> rules can be tagged as "preCheck" (eg number of datanodes registered). When 
> all all the pre-check rules have passed a notification will be sent to all 
> safemode listeners:
> {code}
>   safeModeIsOn -> true
>   preCheckComplete -> true
> {code}
> That will allow the listener to take action on this first stage completing. 
> In the case of PipelineManager, it will then allow pipelines to be created.
> After the remaining rules have been passed, safemode will exit as normal, by 
> sending a second event:
> {code}
>   safeModeIsOn -> false
>   preCheckComplete -> true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to