[ https://issues.apache.org/jira/browse/HDDS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanisha Koneru resolved HDDS-3586. ---------------------------------- Resolution: Fixed > OM HA can be started with 3 isolated LEADER instead of one OM ring > ------------------------------------------------------------------ > > Key: HDDS-3586 > URL: https://issues.apache.org/jira/browse/HDDS-3586 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Reporter: Marton Elek > Assignee: Hanisha Koneru > Priority: Critical > Labels: Triaged, pull-request-available > > Steps to reproduce: > Imagine that I have 3 different om with the following DNS names: > {code} > ozone-om-0.ozone-om > ozone-om-1.ozone-om > ozone-om-2.ozone-om > {code} > I configured the three hosts as the following: > {code} > OZONE-SITE.XML_ozone.om.nodes.omservice: om1,om2,om3 > OZONE-SITE.XML_ozone.om.address.omservice.om1: ozone-om-0 > OZONE-SITE.XML_ozone.om.address.omservice.om2: ozone-om-1 > OZONE-SITE.XML_ozone.om.address.omservice.om3: ozone-om-2 > OZONE-SITE.XML_ozone.om.ratis.enable: "true" > {code} > But unfortunately the DNS is not reliable. All the hosts can resolve only the > LOCAL hostname. > OMHANodeDetails.java ignores ALL the configuration which are not resolvable: > {code} > if (!addr.isUnresolved()) { > if (!isPeer && OmUtils.isAddressLocal(addr)) { > localRpcAddress = addr; > localOMServiceId = serviceId; > localOMNodeId = nodeId; > localRatisPort = ratisPort; > found++; > } else { > // This OMNode belongs to same OM service as the current OMNode. > // Add it to peerNodes list. > // This OMNode belongs to same OM service as the current OMNode. > // Add it to peerNodes list. > peerNodesList.add(getHAOMNodeDetails(conf, serviceId, > nodeId, addr, ratisPort)); > } > } > {code} > As a result I will have 3 running server but each has 1 one-node Ratis ring > (peerNodesList is empty as only the local hostname can be resolved). > Group ID is the same for all. But they have separated database and they work > as separated OM which is VERY dangerous. > 1. Option one: we can accept any unresolved address and retry with > connection create if it couldn't be connected > 2. Option two: at least the error handling should be fixed. When I configured > 3 om, there supposed to be 3 om. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org