[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305187064
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java
 ##
 @@ -123,6 +123,9 @@ private OMConfigKeys() {
   "ozone.om.ratis.log.appender.queue.byte-limit";
   public static final String
   OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB";
+  public static final String OZONE_OM_RATIS_LOG_PURGE_GAP =
+  "ozone.om.ratis.log.purge.gap";
+  public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100;
 
 
 Review comment:
   Filed [HDDS-1831](https://issues.apache.org/jira/browse/HDDS-1831).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305186358
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java
 ##
 @@ -123,6 +123,9 @@ private OMConfigKeys() {
   "ozone.om.ratis.log.appender.queue.byte-limit";
   public static final String
   OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB";
+  public static final String OZONE_OM_RATIS_LOG_PURGE_GAP =
+  "ozone.om.ratis.log.purge.gap";
+  public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100;
 
 
 Review comment:
   Good suggestion! Let me file a followup jira to fix that. Want to get this 
patch committed today, it's been hanging around for over a month.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305030343
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   1024 transactions is 100ms worth of edits in a busy cluster. We could set 
this as high as 1M maybe to keep more history. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305029959
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   Let's set this to a higher value. We don't need to be too aggressive about 
purging Ratis logs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305027589
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer 
ratisServer) {
 ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true)
 .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build();
 this.executorService = HadoopExecutors.newSingleThreadExecutor(build);
+this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor();
   }
 
   /**
* Initializes the State Machine with the given server, group and storage.
* TODO: Load the latest snapshot from the file system.
 
 Review comment:
   This TODO looks a little worrying. Something we need to address now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305026829
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");
+  

[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305026469
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");
+  

[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305025544
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -1223,6 +1231,14 @@ public void start() throws IOException {
 
 DefaultMetricsSystem.initialize("OzoneManager");
 
+// Start Ratis services
+if (omRatisServer != null) {
 
 Review comment:
   When would they be null?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305019552
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
 ##
 @@ -159,15 +159,18 @@ public void decNumKeys() {
   }
 
   public void setNumVolumes(long val) {
-this.numVolumes.incr(val);
+long oldVal = this.numVolumes.value();
 
 Review comment:
   Nice catch!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305018334
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   Does this mean we will snapshot every 1024 transactions?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293606893
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+  return null;
+}
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Stop the ratis server so that no new transactions are applied. This
+// can happen if a leader election happens while the state is being
+// re-initialized.
+omRatisServer.stop();
 
 Review comment:
   One risk is that this code path for stop may not be well tested.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293607099
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+  return null;
+}
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Stop the ratis server so that no new transactions are applied. This
+// can happen if a leader election happens while the state is being
+// re-initialized.
+omRatisServer.stop();
+
+// Clear the OM Double Buffer so that if there are any pending
+// transactions in the buffer, they are discarded.
+omDoubleBuffer.stop();
 
 Review comment:
   `omDoubleBuffer.stop` interrupts the thread but does not call `join`. So the 
doubleBuffer thread may still be running when the stop call returns.
   
   We should probably fix stop to call `join`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293607863
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+  return null;
+}
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Stop the ratis server so that no new transactions are applied. This
+// can happen if a leader election happens while the state is being
+// re-initialized.
+omRatisServer.stop();
+
+// Clear the OM Double Buffer so that if there are any pending
+// transactions in the buffer, they are discarded.
+omDoubleBuffer.stop();
+
+// Take a backup of the current DB
+File dbFile = metadataManager.getStore().getDbLocation();
+String dbBackupFileName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackupFile = new File(dbFile.getParentFile(), dbBackupFileName);
+
+try {
+  Files.move(dbFile.toPath(), dbBackupFile.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.", e);
+  return null;
+}
+
+// Move the downloaded DB checkpoint into the om metadata dir
+Path checkpointPath = omDBcheckpoint.getCheckpointLocation();
+try {
+  Files.move(checkpointPath, dbFile.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to move downloaded DB checkpoint {} to metadata " +
+  "directory {}",checkpointPath, dbFile.toPath(), e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint
+try {
+  reloadOMState();
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Re-instantiate MetadataManager with new DB checkpoint.
+   * All the classes which use/ store MetadataManager should also be updated
+   * with the new MetadataManager instance.
+   */
+  private void reloadOMState() throws IOException {
+
+metadataManager = new OmMetadataManagerImpl(configuration);
+
+metadataManager.start(configuration);
+
+// Set metrics and start metrics back ground thread
+metrics.setNumVolumes(metadataManager.countRowsInTable(metadataManager
+.getVolumeTable()));
+metrics.setNumBuckets(metadataManager.countRowsInTable(metadataManager
+.getBucketTable()));
+
+// Delete the omMetrics file if it exists
+Files.deleteIfExists(getMetricsStorageFile().toPath());
+
+// Re-initialize metadataManager dependent implementations
 
 Review comment:
   Can some of this code be shared with startup initialization by moving to a 
common function?


This is an automated 

[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293607289
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+  return null;
+}
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Stop the ratis server so that no new transactions are applied. This
+// can happen if a leader election happens while the state is being
+// re-initialized.
+omRatisServer.stop();
+
+// Clear the OM Double Buffer so that if there are any pending
+// transactions in the buffer, they are discarded.
+omDoubleBuffer.stop();
+
+// Take a backup of the current DB
+File dbFile = metadataManager.getStore().getDbLocation();
 
 Review comment:
   This is going to be a directory correct? Since I assume a DB is multiple 
files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293606799
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+  return null;
+}
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
 
 Review comment:
   How do we recover from this situation eventually? Should we retry fetching a 
more recent snapshot.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293607463
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+  return null;
+}
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Stop the ratis server so that no new transactions are applied. This
+// can happen if a leader election happens while the state is being
+// re-initialized.
+omRatisServer.stop();
+
+// Clear the OM Double Buffer so that if there are any pending
+// transactions in the buffer, they are discarded.
+omDoubleBuffer.stop();
+
+// Take a backup of the current DB
+File dbFile = metadataManager.getStore().getDbLocation();
+String dbBackupFileName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackupFile = new File(dbFile.getParentFile(), dbBackupFileName);
+
+try {
+  Files.move(dbFile.toPath(), dbBackupFile.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.", e);
+  return null;
+}
+
+// Move the downloaded DB checkpoint into the om metadata dir
+Path checkpointPath = omDBcheckpoint.getCheckpointLocation();
+try {
+  Files.move(checkpointPath, dbFile.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to move downloaded DB checkpoint {} to metadata " +
+  "directory {}",checkpointPath, dbFile.toPath(), e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint
+try {
+  reloadOMState();
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
 
 Review comment:
   I didn't understand this TODO. Could you clarify a bit more?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293599252
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint;
+try {
+  omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
 
 Review comment:
   One question: currently we are passing `leaderId` = null. So will this call 
to `getOzoneManagerDBSnapshot` fail?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293598965
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
 
 Review comment:
   Again thanks for the great method javadocs! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293598569
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -171,6 +168,22 @@ public long takeSnapshot() throws IOException {
 return 0;
   }
 
+  /**
+   * Leader OM has purged entries from its log. To catch up, OM must download
 
 Review comment:
   Thanks for adding descriptive javadocs to methods. It really makes code 
review much easier!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293597766
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -171,6 +168,22 @@ public long takeSnapshot() throws IOException {
 return 0;
   }
 
+  /**
+   * Leader OM has purged entries from its log. To catch up, OM must download
+   * the latest checkpoint from the leader OM and install it.
+   * @param firstTermIndexInLog TermIndex of the first append entry available
+   *   in the Leader's log.
+   * @return the last term index included in the installed snapshot.
+   */
+  public CompletableFuture notifyInstallSnapshotFromLeader(
+  TermIndex firstTermIndexInLog) {
+// TODO: Raft server should send the leaderId
+String leaderId = null;
+CompletableFuture future = CompletableFuture
+.supplyAsync(() -> ozoneManager.installSnapshot(leaderId));
 
 Review comment:
   We should not execute this in the default ForkJoinPool. That can suffer from 
thread exhaustion/deadlock issues since there are very few threads in the 
default pool.
   
   Instead use the overload of `supplyAsync` that accepts an `Executor`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-06-13 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293597419
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -171,6 +168,22 @@ public long takeSnapshot() throws IOException {
 return 0;
   }
 
+  /**
+   * Leader OM has purged entries from its log. To catch up, OM must download
+   * the latest checkpoint from the leader OM and install it.
+   * @param firstTermIndexInLog TermIndex of the first append entry available
+   *   in the Leader's log.
+   * @return the last term index included in the installed snapshot.
+   */
+  public CompletableFuture notifyInstallSnapshotFromLeader(
 
 Review comment:
   Please add the `@Override` annotation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org