[ 
https://issues.apache.org/jira/browse/HDDS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2848:
------------------------------------
    Description: 
Recon talks to OM in 2 ways - Through HTTP to get DB snapshot, and through RPC 
to get delta updates.

Since Recon already uses the OzoneManagerClientProtocol to query the 
OzoneManager RPC, the RPC client automatically routes the request to the leader 
on an OM HA cluster. Recon only needs the updates from the OM RocksDB store, 
and does not need the in flight updates in the OM DoubleBuffer. Due to the 
guarantee from Ratis that the leader’s RocksDB will always be up to date, Recon 
does not need to worry about going back in time when a current OM leader goes 
down. We have to pass in the om service ID to the Ozone Manager client in 
Recon, and the failover works internally. Currently we pass in 'null'.

To make the HTTP call to work against OM HA, Recon has to find out the current 
OM leader and download the snapshot from that OM instance. We can use the way 
this has been implemented in 
org.apache.hadoop.ozone.admin.om.GetServiceRolesSubcommand. We can get the 
roles of OM instances and then determine the leader from that. 

  was:Since Recon already uses the OzoneManagerClientProtocol to query the 
OzoneManager, on an OM HAcluster, the RPC client automatically routes the 
request to the leader. Recon only needs the updates from the OM RocksDB store, 
and does not need the in flight updates in the OM DoubleBuffer. Due to the 
guarantee from Ratis that the leader’s RocksDB will always be up to date, Recon 
does not need to worry about going back in time when a current OM leader goes 
down.  In this version of Recon, the plan is to test out OM HA scenarios and 
add unit/integration tests to verify correct behavior


> Recon should work with OM HA
> ----------------------------
>
>                 Key: HDDS-2848
>                 URL: https://issues.apache.org/jira/browse/HDDS-2848
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Recon
>            Reporter: Aravindan Vijayan
>            Priority: Major
>
> Recon talks to OM in 2 ways - Through HTTP to get DB snapshot, and through 
> RPC to get delta updates.
> Since Recon already uses the OzoneManagerClientProtocol to query the 
> OzoneManager RPC, the RPC client automatically routes the request to the 
> leader on an OM HA cluster. Recon only needs the updates from the OM RocksDB 
> store, and does not need the in flight updates in the OM DoubleBuffer. Due to 
> the guarantee from Ratis that the leader’s RocksDB will always be up to date, 
> Recon does not need to worry about going back in time when a current OM 
> leader goes down. We have to pass in the om service ID to the Ozone Manager 
> client in Recon, and the failover works internally. Currently we pass in 
> 'null'.
> To make the HTTP call to work against OM HA, Recon has to find out the 
> current OM leader and download the snapshot from that OM instance. We can use 
> the way this has been implemented in 
> org.apache.hadoop.ozone.admin.om.GetServiceRolesSubcommand. We can get the 
> roles of OM instances and then determine the leader from that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to