[
https://issues.apache.org/jira/browse/HDDS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hanisha Koneru reassigned HDDS-3586:
------------------------------------
Assignee: Hanisha Koneru
> OM HA can be started with 3 isolated LEADER instead of one OM ring
> ------------------------------------------------------------------
>
> Key: HDDS-3586
> URL: https://issues.apache.org/jira/browse/HDDS-3586
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Marton Elek
> Assignee: Hanisha Koneru
> Priority: Critical
>
> Steps to reproduce:
> Imagine that I have 3 different om with the following DNS names:
> {code}
> ozone-om-0.ozone-om
> ozone-om-1.ozone-om
> ozone-om-2.ozone-om
> {code}
> I configured the three hosts as the following:
> {code}
> OZONE-SITE.XML_ozone.om.nodes.omservice: om1,om2,om3
> OZONE-SITE.XML_ozone.om.address.omservice.om1: ozone-om-0
> OZONE-SITE.XML_ozone.om.address.omservice.om2: ozone-om-1
> OZONE-SITE.XML_ozone.om.address.omservice.om3: ozone-om-2
> OZONE-SITE.XML_ozone.om.ratis.enable: "true"
> {code}
> But unfortunately the DNS is not reliable. All the hosts can resolve only the
> LOCAL hostname.
> OMHANodeDetails.java ignores ALL the configuration which are not resolvable:
> {code}
> if (!addr.isUnresolved()) {
> if (!isPeer && OmUtils.isAddressLocal(addr)) {
> localRpcAddress = addr;
> localOMServiceId = serviceId;
> localOMNodeId = nodeId;
> localRatisPort = ratisPort;
> found++;
> } else {
> // This OMNode belongs to same OM service as the current OMNode.
> // Add it to peerNodes list.
> // This OMNode belongs to same OM service as the current OMNode.
> // Add it to peerNodes list.
> peerNodesList.add(getHAOMNodeDetails(conf, serviceId,
> nodeId, addr, ratisPort));
> }
> }
> {code}
> As a result I will have 3 running server but each has 1 one-node Ratis ring
> (peerNodesList is empty as only the local hostname can be resolved).
> Group ID is the same for all. But they have separated database and they work
> as separated OM which is VERY dangerous.
> 1. Option one: we can accept any unresolved address and retry with
> connection create if it couldn't be connected
> 2. Option two: at least the error handling should be fixed. When I configured
> 3 om, there supposed to be 3 om.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]