[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r271915226 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1617,7 +1617,7 @@ ozone.om.ratis.snapshot.auto.trigger.threshold -40L +40 Review comment: Thanks for the info. We can tweak this later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r271446311 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1617,7 +1617,7 @@ ozone.om.ratis.snapshot.auto.trigger.threshold -40L +40 Review comment: Then I think I having less value make sense. Do we want to revisit this later, for now just go with ratis default value of 400k? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270634602 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1617,7 +1617,7 @@ ozone.om.ratis.snapshot.auto.trigger.threshold -40L +40 Review comment: Question: If we have taken a snapshot for every 400k, then after that 200k transactions have happened, then when follower OM restart's because it knows it has till 400k only, so will it apply 200k transactions again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270634511 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -161,7 +161,10 @@ public TransactionContext startTransaction( @Override public long takeSnapshot() throws IOException { LOG.info("Saving Ratis snapshot on the OM."); -return ozoneManager.saveRatisSnapshot(); +if (ozoneManager != null) { + return ozoneManager.saveRatisSnapshot(); +} +return 0; Review comment: Question: Here we are returning 0,(Should we return the lastAppliedIndex we are writing) How this will be used by Ratis? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270125836 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1603,18 +1603,27 @@ ozone.om.ratis.log.appender.queue.num-elements 1024 -OZONE, DEBUG, CONTAINER, RATIS +OZONE, DEBUG, OM, RATIS Number of operation pending with Raft's Log Worker. ozone.om.ratis.log.appender.queue.byte-limit 32MB -OZONE, DEBUG, CONTAINER, RATIS +OZONE, DEBUG, OM, RATIS Byte limit for Raft's Log Worker queue. + +ozone.om.ratis.snapshot.auto.trigger.threshold +40L Review comment: Why we have taken 400k as default, any reason for this value? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270115982 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java ## @@ -130,11 +131,13 @@ private OzoneManagerRatisServer(Configuration conf, LOG.info("Instantiating OM Ratis server with GroupID: {} and " + "Raft Peers: {}", raftGroupIdStr, raftPeersStr.toString().substring(2)); +this.omStateMachine = getStateMachine(this.raftGroupId); Review comment: Here, we are passing raftGroupId to getStateMachine, but we are not using that param in method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270125514 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHA.java ## @@ -534,4 +536,61 @@ public void testReadRequest() throws Exception { proxyProvider.getCurrentProxyOMNodeId()); } } + + @Test + public void testOMRatisSnapshot() throws Exception { +String userName = "user" + RandomStringUtils.randomNumeric(5); +String adminName = "admin" + RandomStringUtils.randomNumeric(5); +String volumeName = "volume" + RandomStringUtils.randomNumeric(5); +String bucketName = "bucket" + RandomStringUtils.randomNumeric(5); + +VolumeArgs createVolumeArgs = VolumeArgs.newBuilder() +.setOwner(userName) +.setAdmin(adminName) +.build(); + +objectStore.createVolume(volumeName, createVolumeArgs); +OzoneVolume retVolumeinfo = objectStore.getVolume(volumeName); + +retVolumeinfo.createBucket(bucketName); +OzoneBucket ozoneBucket = retVolumeinfo.getBucket(bucketName); + +String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider() +.getCurrentProxyOMNodeId(); +OzoneManager ozoneManager = cluster.getOzoneManager(leaderOMNodeId); + +// Send commands to ratis to increase the log index so that ratis +// triggers a snapshot on the state machine. + +long appliedLogIndex = 0; +while (appliedLogIndex <= SNAPSHOT_THRESHOLD) { + String keyName = "key" + RandomStringUtils.randomNumeric(5); + String data = "data" + RandomStringUtils.randomNumeric(5); + OzoneOutputStream ozoneOutputStream = ozoneBucket.createKey(keyName, + data.length(), ReplicationType.STAND_ALONE, + ReplicationFactor.ONE, new HashMap<>()); + ozoneOutputStream.write(data.getBytes(), 0, data.length()); + ozoneOutputStream.close(); + + appliedLogIndex = ozoneManager.getOmRatisServer() + .getStateMachineLastAppliedIndex(); +} + +GenericTestUtils.waitFor(() -> { + if (ozoneManager.loadRatisSnapshotIndex() > 0) { +return true; + } Review comment: Test LGTM, as we are writing to the same file, can we add another round of SNAPSHOT_THRESHOLD requests, and then check whether the value is expected or not. As here, for every SNAPSHOT_THRESHOLD we write to the same file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270117723 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java ## @@ -130,11 +131,13 @@ private OzoneManagerRatisServer(Configuration conf, LOG.info("Instantiating OM Ratis server with GroupID: {} and " + "Raft Peers: {}", raftGroupIdStr, raftPeersStr.toString().substring(2)); +this.omStateMachine = getStateMachine(this.raftGroupId); Review comment: We are passing raftGroupId, but it is not used in the getStateMachine This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270117510 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java ## @@ -186,7 +189,7 @@ public static OzoneManagerRatisServer newOMRatisServer( raftPeers.add(raftPeer); } -return new OzoneManagerRatisServer(ozoneConf, om, omServiceId, +return new OzoneManagerRatisServer(ozoneConf, omProtocol, omServiceId, Review comment: I think here it should be om? or we should change the method parameter name to omProtocol. Currently code is not getting compiled. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270120787 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -115,7 +117,60 @@ public TransactionContext startTransaction( return ctxt; } return handleStartTransactionRequests(raftClientRequest, omRequest); + } + + /* + * Apply a committed log entry to the state machine. + */ + @Override + public CompletableFuture applyTransaction(TransactionContext trx) { +try { + OMRequest request = OMRatisHelper.convertByteStringToOMRequest( + trx.getStateMachineLogEntry().getLogData()); + long trxLogIndex = trx.getLogEntry().getIndex(); + CompletableFuture future = CompletableFuture + .supplyAsync(() -> runCommand(request, trxLogIndex)); + return future; +} catch (IOException e) { + return completeExceptionally(e); +} + } + + /** + * Query the state machine. The request must be read-only. + */ + @Override + public CompletableFuture query(Message request) { +try { + OMRequest omRequest = OMRatisHelper.convertByteStringToOMRequest( + request.getContent()); + return CompletableFuture.completedFuture(queryCommand(omRequest)); +} catch (IOException e) { + return completeExceptionally(e); +} + } + + /** + * Take OM Ratis snapshot. Write the snapshot index to file. Snapshot index + * is the log index corresponding to the last applied transaction on the OM + * State Machine. + * + * @return the last applied index on the state machine which has been + * stored in the snapshot file. + */ + @Override + public long takeSnapshot() throws IOException { +LOG.info("Saving Ratis snapshot on the OM."); +return ozoneManager.saveRatisSnapshot(); Review comment: Question: Can we consider this applied and when the snapshot is taken can we consider that it is completed? As writing to rocksdb means is it is not written to disk(As we are writing with sync false), so when we take a snapshot, we should also flush the DB to make sure the applied index transaction is applied to OM DB. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM
bharatviswa504 commented on a change in pull request #651: HDDS-1339. Implement ratis snapshots on OM URL: https://github.com/apache/hadoop/pull/651#discussion_r270122895 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -308,56 +357,35 @@ private IOException constructExceptionForFailedRequest( STATUS_CODE + omResponse.getStatus()); } - /* - * Apply a committed log entry to the state machine. - */ - @Override - public CompletableFuture applyTransaction(TransactionContext trx) { -try { - OMRequest request = OMRatisHelper.convertByteStringToOMRequest( - trx.getStateMachineLogEntry().getLogData()); - CompletableFuture future = CompletableFuture - .supplyAsync(() -> runCommand(request)); - return future; -} catch (IOException e) { - return completeExceptionally(e); -} - } - /** - * Query the state machine. The request must be read-only. + * Submits write request to OM and returns the response Message. + * @param request OMRequest + * @return response from OM + * @throws ServiceException */ - @Override - public CompletableFuture query(Message request) { -try { - OMRequest omRequest = OMRatisHelper.convertByteStringToOMRequest( - request.getContent()); - return CompletableFuture.completedFuture(runCommand(omRequest)); -} catch (IOException e) { - return completeExceptionally(e); + private Message runCommand(OMRequest request, long trxLogIndex) { +OMResponse response = handler.handle(request); +if (response.getSuccess()) { Review comment: Why we have checked getSuccess here, then considered that as lastAppliedIndex. As when there are cases like bucket creation failed for an already existing bucket, the success will be set false. But that transaction request is successfully completed. This one also should be considered as applied only right?(Even though it does not mutate om DB, but the transaction request has been completed) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org