Stephen O'Donnell created HDDS-3107:
---------------------------------------
Summary: Pipelines may not be rack aware on cluster startup
Key: HDDS-3107
URL: https://issues.apache.org/jira/browse/HDDS-3107
Project: Hadoop Distributed Data Store
Issue Type: Sub-task
Components: SCM
Affects Versions: 0.6.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
Given a 6 node cluster with 2 racks so there are 3 nodes per rack, it is
possible for the pipeline to be created in a non-rack-aware way on startup.
Using a robot test, like the one in HDDS-3084 intermittently I can see that if
all nodes from one rack get registered first, a pipeline creation is triggered
on them resulting in a pipeline which is all on one rack. Then the next 3 nodes
register and as there are no nodes available on the other rack, they too join a
"one rack" pipeline.
This log snippet shows this happening. I will attach the full docker-compose
log:
{code}
egrep "Sending CreatePipelineCommand|Registered Data node|Created pipe"
docker-ozone-topology-ozone-topology-readdata-scm.log
scm_1 | 2020-02-28 12:27:57,826 [IPC Server handler 6 on 9861] INFO
node.SCMNodeManager: Registered Data node :
74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host:
ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}
scm_1 | 2020-02-28 12:27:57,840 [IPC Server handler 9 on 9861] INFO
node.SCMNodeManager: Registered Data node :
32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host:
ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}
scm_1 | 2020-02-28 12:27:57,903 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=16806a56-8e35-46b2-aefd-cb5232d6f5f7 to
datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
scm_1 | 2020-02-28 12:27:57,924 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
16806a56-8e35-46b2-aefd-cb5232d6f5f7, Nodes:
32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host:
ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:57.891553Z]
scm_1 | 2020-02-28 12:27:57,932 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=5a3edf1e-84f6-48ef-a333-6f3e924898a6 to
datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
scm_1 | 2020-02-28 12:27:57,933 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
5a3edf1e-84f6-48ef-a333-6f3e924898a6, Nodes:
74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host:
ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:57.932422Z]
scm_1 | 2020-02-28 12:27:58,213 [IPC Server handler 8 on 9861] INFO
node.SCMNodeManager: Registered Data node :
4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host:
ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}
scm_1 | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=ba2034fc-cb11-482a-9843-435294862240 to
datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
scm_1 | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
ba2034fc-cb11-482a-9843-435294862240, Nodes:
4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host:
ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:58.216275Z]
scm_1 | 2020-02-28 12:27:58,218 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to
datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
scm_1 | 2020-02-28 12:27:58,219 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to
datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
scm_1 | 2020-02-28 12:27:58,220 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to
datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
scm_1 | 2020-02-28 12:27:58,221 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
4f16913d-ec06-44b4-a577-6664a517e401, Nodes:
4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host:
ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host:
ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host:
ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1,
certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:58.218896Z]
scm_1 | 2020-02-28 12:27:58,645 [IPC Server handler 7 on 9861] INFO
node.SCMNodeManager: Registered Data node :
66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host:
ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}
scm_1 | 2020-02-28 12:27:58,645 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=4739840f-8bb3-4742-ac5e-ac519b51e0fd to
datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
scm_1 | 2020-02-28 12:27:58,647 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
4739840f-8bb3-4742-ac5e-ac519b51e0fd, Nodes:
66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host:
ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:58.645455Z]
scm_1 | 2020-02-28 12:27:59,339 [IPC Server handler 7 on 9861] INFO
node.SCMNodeManager: Registered Data node :
9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host:
ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}
scm_1 | 2020-02-28 12:27:59,340 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=555b9a1d-1c4a-4d9f-b198-492da7005ccd to
datanode:9be38eea-bacc-434a-876d-50b105d4daa2
scm_1 | 2020-02-28 12:27:59,341 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
555b9a1d-1c4a-4d9f-b198-492da7005ccd, Nodes:
9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host:
ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:59.340193Z]
scm_1 | 2020-02-28 12:27:59,672 [IPC Server handler 6 on 9861] INFO
node.SCMNodeManager: Registered Data node :
cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host:
ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}
scm_1 | 2020-02-28 12:27:59,673 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=a6d77ef7-52c0-4f6a-8c22-f0b405da08a1 to
datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
scm_1 | 2020-02-28 12:27:59,674 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
a6d77ef7-52c0-4f6a-8c22-f0b405da08a1, Nodes:
cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host:
ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
CreationTimestamp2020-02-28T12:27:59.673585Z]
scm_1 | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to
datanode:9be38eea-bacc-434a-876d-50b105d4daa2
scm_1 | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to
datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
scm_1 | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO
pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to
datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
scm_1 | 2020-02-28 12:27:59,684 [RatisPipelineUtilsThread] INFO
pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
70cfd35d-b778-42df-bcba-3ba14bd8ead0, Nodes:
9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host:
ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2,
certSerialId: null}66ec72b2-4be5-453f-ac44-cc9857bad
{code}
I believe there are a few things to consider here:
1) Do we need a better way to see if rack awareness is enabled? Currently we
check the network topology for a count of rack nodes, but these are only
created as the nodes register. Should we use the cluster map to determine the
intended number of racks on the cluster?
2) Should we fallback to non-rack-aware so easily? Pipelines are long lived,
and if they are created non-rack aware, they will stay that way potential
forever. Maybe we need to delay pipeline creation on startup until the node
count settles?
3) If a pipeline or new container is being placed non-rack aware in a rack
aware cluster should we complain loudly in the logs, JMX, in Recon?
4) Do we need something to check for non-rack aware pipelines and fix them if
it can? Eg if we have 2 racks, and stop 1 rack, then we must create a
non-rack-aware pipeline to keep on writing, but when the other rack is
restarted, that pipeline should be destroyed and a new rack-aware one created.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]