[jira] [Commented] (HADOOP-16398) Exports Hadoop metrics to Prometheus
[ https://issues.apache.org/jira/browse/HADOOP-16398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888534#comment-16888534 ] Akira Ajisaka commented on HADOOP-16398: Hi [~elek] and [~anu], would you review this? After this issue is resolved, I'd like to parse NNTop metrics for Prometheus. > Exports Hadoop metrics to Prometheus > > > Key: HADOOP-16398 > URL: https://issues.apache.org/jira/browse/HADOOP-16398 > Project: Hadoop Common > Issue Type: New Feature > Components: metrics >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: HADOOP-16398.001.patch > > > Hadoop common side of HDDS-846. HDDS already have its own > PrometheusMetricsSink, so we can reuse the implementation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] Cosss7 opened a new pull request #1129: HDFS-14509 DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x
Cosss7 opened a new pull request #1129: HDFS-14509 DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x URL: https://github.com/apache/hadoop/pull/1129 reference to jira This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16431) Remove useless log in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888499#comment-16888499 ] Hadoop QA commented on HADOOP-16431: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 36s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green} hadoop-openstack in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}107m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HADOOP-16431 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12975213/HADOOP-16431.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0da43f6c67e5 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d545f9c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results |
[GitHub] [hadoop] Cosss7 opened a new pull request #1128: HDFS-14551 NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x
Cosss7 opened a new pull request #1128: HDFS-14551 NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x URL: https://github.com/apache/hadoop/pull/1128 reference jira. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on a change in pull request #1112: HDDS-1713. ReplicationManager fail to find proper node topology based…
ChenSammi commented on a change in pull request #1112: HDDS-1713. ReplicationManager fail to find proper node topology based… URL: https://github.com/apache/hadoop/pull/1112#discussion_r305190794 ## File path: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementRackAware.java ## @@ -137,10 +137,6 @@ public void chooseNodeWithNoExcludedNodes() throws SCMException { datanodeDetails.get(2))); Assert.assertFalse(cluster.isSameParent(datanodeDetails.get(1), datanodeDetails.get(2))); -Assert.assertFalse(cluster.isSameParent(datanodeDetails.get(0), -datanodeDetails.get(3))); -Assert.assertFalse(cluster.isSameParent(datanodeDetails.get(2), -datanodeDetails.get(3))); Review comment: Thanks for the comments. Will remove last two assertions in testFallback. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305187064 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java ## @@ -123,6 +123,9 @@ private OMConfigKeys() { "ozone.om.ratis.log.appender.queue.byte-limit"; public static final String OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB"; + public static final String OZONE_OM_RATIS_LOG_PURGE_GAP = + "ozone.om.ratis.log.purge.gap"; + public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100; Review comment: Filed [HDDS-1831](https://issues.apache.org/jira/browse/HDDS-1831). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305186358 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java ## @@ -123,6 +123,9 @@ private OMConfigKeys() { "ozone.om.ratis.log.appender.queue.byte-limit"; public static final String OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB"; + public static final String OZONE_OM_RATIS_LOG_PURGE_GAP = + "ozone.om.ratis.log.purge.gap"; + public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100; Review comment: Good suggestion! Let me file a followup jira to fix that. Want to get this patch committed today, it's been hanging around for over a month. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi merged pull request #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou…
ChenSammi merged pull request #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou… URL: https://github.com/apache/hadoop/pull/1067 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou…
ChenSammi commented on issue #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou… URL: https://github.com/apache/hadoop/pull/1067#issuecomment-513070699 +1, will commit shortly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] xiaoyuyao commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…
xiaoyuyao commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli… URL: https://github.com/apache/hadoop/pull/1124#issuecomment-513068886 Randomize is good to balance the load. However, For write, we still must go through the leader (first node). For read, we can only use random optimization for closed container. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] anuengineer commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
anuengineer commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305183995 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java ## @@ -123,6 +123,9 @@ private OMConfigKeys() { "ozone.om.ratis.log.appender.queue.byte-limit"; public static final String OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB"; + public static final String OZONE_OM_RATIS_LOG_PURGE_GAP = + "ozone.om.ratis.log.purge.gap"; + public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100; Review comment: Can we please use the new format for configs? Here are some examples: https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16431) Remove useless log in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HADOOP-16431: - Priority: Minor (was: Major) > Remove useless log in IOUtils.java and ExceptionDiags.java > -- > > Key: HADOOP-16431 > URL: https://issues.apache.org/jira/browse/HADOOP-16431 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Lisheng Sun >Priority: Minor > Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch > > > When there is no String Constructor for the exception, we Log a Warn Message, > and rethrow the exception. We can change the Log level to TRACE/DEBUG. > {code:java} > private static T wrapWithMessage( > T exception, String msg) { > Class clazz = exception.getClass(); > try { > Constructor ctor = > clazz.getConstructor(String.class); > Throwable t = ctor.newInstance(msg); > return (T) (t.initCause(exception)); > } catch (Throwable e) { > LOG.trace("Unable to wrap exception of type " + > clazz + ": it has no (String) constructor", e); > return exception; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16431) Remove useless log in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HADOOP-16431: - Summary: Remove useless log in IOUtils.java and ExceptionDiags.java (was: Change Log Level to trace in IOUtils.java and ExceptionDiags.java) > Remove useless log in IOUtils.java and ExceptionDiags.java > -- > > Key: HADOOP-16431 > URL: https://issues.apache.org/jira/browse/HADOOP-16431 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Lisheng Sun >Priority: Major > Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch > > > When there is no String Constructor for the exception, we Log a Warn Message, > and rethrow the exception. We can change the Log level to TRACE/DEBUG. > {code:java} > private static T wrapWithMessage( > T exception, String msg) { > Class clazz = exception.getClass(); > try { > Constructor ctor = > clazz.getConstructor(String.class); > Throwable t = ctor.newInstance(msg); > return (T) (t.initCause(exception)); > } catch (Throwable e) { > LOG.trace("Unable to wrap exception of type " + > clazz + ": it has no (String) constructor", e); > return exception; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888450#comment-16888450 ] Lisheng Sun commented on HADOOP-16431: -- Thank [~elgoiri] for your good suggestions. I have update this patch.Could you help review it? And assign this issue to me. Thank you > Change Log Level to trace in IOUtils.java and ExceptionDiags.java > - > > Key: HADOOP-16431 > URL: https://issues.apache.org/jira/browse/HADOOP-16431 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Lisheng Sun >Priority: Major > Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch > > > When there is no String Constructor for the exception, we Log a Warn Message, > and rethrow the exception. We can change the Log level to TRACE/DEBUG. > {code:java} > private static T wrapWithMessage( > T exception, String msg) { > Class clazz = exception.getClass(); > try { > Constructor ctor = > clazz.getConstructor(String.class); > Throwable t = ctor.newInstance(msg); > return (T) (t.initCause(exception)); > } catch (Throwable e) { > LOG.trace("Unable to wrap exception of type " + > clazz + ": it has no (String) constructor", e); > return exception; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-513063053 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HADOOP-16431: - Attachment: HADOOP-16431.002.patch > Change Log Level to trace in IOUtils.java and ExceptionDiags.java > - > > Key: HADOOP-16431 > URL: https://issues.apache.org/jira/browse/HADOOP-16431 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Lisheng Sun >Priority: Major > Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch > > > When there is no String Constructor for the exception, we Log a Warn Message, > and rethrow the exception. We can change the Log level to TRACE/DEBUG. > {code:java} > private static T wrapWithMessage( > T exception, String msg) { > Class clazz = exception.getClass(); > try { > Constructor ctor = > clazz.getConstructor(String.class); > Throwable t = ctor.newInstance(msg); > return (T) (t.initCause(exception)); > } catch (Throwable e) { > LOG.trace("Unable to wrap exception of type " + > clazz + ": it has no (String) constructor", e); > return exception; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888439#comment-16888439 ] Íñigo Goiri commented on HADOOP-16431: -- Given that log and throw is not a very good approach I guess the right thing would be to just not log it at all. > Change Log Level to trace in IOUtils.java and ExceptionDiags.java > - > > Key: HADOOP-16431 > URL: https://issues.apache.org/jira/browse/HADOOP-16431 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Lisheng Sun >Priority: Major > Attachments: HADOOP-16431.001.patch > > > When there is no String Constructor for the exception, we Log a Warn Message, > and rethrow the exception. We can change the Log level to TRACE/DEBUG. > {code:java} > private static T wrapWithMessage( > T exception, String msg) { > Class clazz = exception.getClass(); > try { > Constructor ctor = > clazz.getConstructor(String.class); > Throwable t = ctor.newInstance(msg); > return (T) (t.initCause(exception)); > } catch (Throwable e) { > LOG.trace("Unable to wrap exception of type " + > clazz + ": it has no (String) constructor", e); > return exception; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] mackrorysd commented on issue #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration
mackrorysd commented on issue #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration URL: https://github.com/apache/hadoop/pull/1125#issuecomment-513057068 Common unit test failure is unrelated - not even changing common or anything it depends on. Not including other tests because this is a performance tuning. See JIRA for numbers from performance testing. Identical patch was +1'd on JIRA - will merge in 12 hours if I don't hear otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java
[ https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888429#comment-16888429 ] Lisheng Sun commented on HADOOP-16431: -- [~linyiqun] [~jojochuang] [~hexiaoqiao] [~elgoiri] Could you help review this patch? Thank you. > Change Log Level to trace in IOUtils.java and ExceptionDiags.java > - > > Key: HADOOP-16431 > URL: https://issues.apache.org/jira/browse/HADOOP-16431 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Lisheng Sun >Priority: Major > Attachments: HADOOP-16431.001.patch > > > When there is no String Constructor for the exception, we Log a Warn Message, > and rethrow the exception. We can change the Log level to TRACE/DEBUG. > {code:java} > private static T wrapWithMessage( > T exception, String msg) { > Class clazz = exception.getClass(); > try { > Constructor ctor = > clazz.getConstructor(String.class); > Throwable t = ctor.newInstance(msg); > return (T) (t.initCause(exception)); > } catch (Throwable e) { > LOG.trace("Unable to wrap exception of type " + > clazz + ": it has no (String) constructor", e); > return exception; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-513052953 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 38 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 4 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 66 | Maven dependency ordering for branch | | +1 | mvninstall | 482 | trunk passed | | +1 | compile | 265 | trunk passed | | +1 | checkstyle | 77 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 865 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 187 | trunk passed | | 0 | spotbugs | 316 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 509 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 74 | Maven dependency ordering for patch | | +1 | mvninstall | 435 | the patch passed | | +1 | compile | 271 | the patch passed | | +1 | cc | 271 | the patch passed | | +1 | javac | 271 | the patch passed | | +1 | checkstyle | 73 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 1 | The patch has no ill-formed XML file. | | +1 | shadedclient | 648 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 159 | the patch passed | | +1 | findbugs | 536 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 277 | hadoop-hdds in the patch passed. | | -1 | unit | 1639 | hadoop-ozone in the patch failed. | | +1 | asflicense | 50 | The patch does not generate ASF License warnings. | | | | 6814 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.om.snapshot.TestOzoneManagerSnapshotProvider | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.container.server.TestSecureContainerServer | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/948 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc | | uname | Linux 0196a0d19d2c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d545f9c | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/testReport/ | | Max. process+thread count | 5388 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common hadoop-ozone/common hadoop-ozone/integration-test hadoop-ozone/ozone-manager U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] commanderchewbacca closed pull request #1127: Gcs connector
commanderchewbacca closed pull request #1127: Gcs connector URL: https://github.com/apache/hadoop/pull/1127 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] commanderchewbacca opened a new pull request #1127: Gcs connector
commanderchewbacca opened a new pull request #1127: Gcs connector URL: https://github.com/apache/hadoop/pull/1127 added gcs connector for operator-metering This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] commanderchewbacca closed pull request #1126: Gcs connector
commanderchewbacca closed pull request #1126: Gcs connector URL: https://github.com/apache/hadoop/pull/1126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] commanderchewbacca opened a new pull request #1126: Gcs connector
commanderchewbacca opened a new pull request #1126: Gcs connector URL: https://github.com/apache/hadoop/pull/1126 added gcs connector to images to allow for gcs use for operator-metering This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
[ https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888419#comment-16888419 ] Josh Rosen commented on HADOOP-16437: - A fallback configuration is an interesting idea. I guess the addition of a new configuration alias for the typo would, itself, be a behavior change because what was previously a no-op would start having an actual effect (so maybe we'd want to {{releasenotes}} that?). > Documentation typos: fs.s3a.experimental.fadvise -> > fs.s3a.experimental.input.fadvise > - > > Key: HADOOP-16437 > URL: https://issues.apache.org/jira/browse/HADOOP-16437 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Affects Versions: 3.2.0, 3.3.0, 3.1.2 >Reporter: Josh Rosen >Priority: Major > Fix For: 3.3.0 > > > The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I > believe this is a typo: the actual configuration key that gets read is > {{fs.s3a.experimental.input.fadvise}}. > I'll submit a PR to fix this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hadoop-yetus commented on issue #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration
hadoop-yetus commented on issue #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration URL: https://github.com/apache/hadoop/pull/1125#issuecomment-513048431 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 65 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 23 | Maven dependency ordering for branch | | +1 | mvninstall | 1162 | trunk passed | | +1 | compile | 1033 | trunk passed | | +1 | checkstyle | 154 | trunk passed | | +1 | mvnsite | 128 | trunk passed | | +1 | shadedclient | 1091 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 105 | trunk passed | | 0 | spotbugs | 81 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 223 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 26 | Maven dependency ordering for patch | | +1 | mvninstall | 97 | the patch passed | | +1 | compile | 1059 | the patch passed | | +1 | javac | 1059 | the patch passed | | +1 | checkstyle | 191 | the patch passed | | +1 | mvnsite | 132 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 2 | The patch has no ill-formed XML file. | | +1 | shadedclient | 737 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 96 | the patch passed | | +1 | findbugs | 214 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 546 | hadoop-common in the patch failed. | | +1 | unit | 310 | hadoop-aws in the patch passed. | | +1 | asflicense | 54 | The patch does not generate ASF License warnings. | | | | 7454 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.security.TestFixKerberosTicketOrder | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1125 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux eb09206b2aa1 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d545f9c | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/testReport/ | | Max. process+thread count | 1345 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-513043559 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 98 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 4 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 24 | Maven dependency ordering for branch | | +1 | mvninstall | 527 | trunk passed | | +1 | compile | 269 | trunk passed | | +1 | checkstyle | 76 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 959 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 182 | trunk passed | | 0 | spotbugs | 375 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 606 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 33 | Maven dependency ordering for patch | | +1 | mvninstall | 442 | the patch passed | | +1 | compile | 279 | the patch passed | | +1 | cc | 279 | the patch passed | | +1 | javac | 279 | the patch passed | | +1 | checkstyle | 85 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 1 | The patch has no ill-formed XML file. | | +1 | shadedclient | 764 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 169 | the patch passed | | +1 | findbugs | 537 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 336 | hadoop-hdds in the patch passed. | | -1 | unit | 2175 | hadoop-ozone in the patch failed. | | +1 | asflicense | 55 | The patch does not generate ASF License warnings. | | | | 7789 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.om.snapshot.TestOzoneManagerSnapshotProvider | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.ozone.client.rpc.Test2WayCommitInRatis | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/948 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc | | uname | Linux f2938c9c233e 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d545f9c | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/testReport/ | | Max. process+thread count | 4968 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common hadoop-ozone/common hadoop-ozone/integration-test hadoop-ozone/ozone-manager U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional
[GitHub] [hadoop] hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-513032608 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 70 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 4 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 22 | Maven dependency ordering for branch | | +1 | mvninstall | 487 | trunk passed | | +1 | compile | 305 | trunk passed | | +1 | checkstyle | 74 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 965 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 164 | trunk passed | | 0 | spotbugs | 349 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 553 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 32 | Maven dependency ordering for patch | | -1 | mvninstall | 160 | hadoop-ozone in the patch failed. | | -1 | compile | 62 | hadoop-ozone in the patch failed. | | -1 | cc | 62 | hadoop-ozone in the patch failed. | | -1 | javac | 62 | hadoop-ozone in the patch failed. | | +1 | checkstyle | 84 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 2 | The patch has no ill-formed XML file. | | +1 | shadedclient | 756 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 180 | the patch passed | | -1 | findbugs | 108 | hadoop-ozone in the patch failed. | ||| _ Other Tests _ | | +1 | unit | 339 | hadoop-hdds in the patch passed. | | -1 | unit | 118 | hadoop-ozone in the patch failed. | | +1 | asflicense | 38 | The patch does not generate ASF License warnings. | | | | 5149 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/948 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml cc | | uname | Linux 5fd557cb2f6c 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d5ef38b | | Default Java | 1.8.0_212 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-mvninstall-hadoop-ozone.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-compile-hadoop-ozone.txt | | cc | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-compile-hadoop-ozone.txt | | javac | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-compile-hadoop-ozone.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-findbugs-hadoop-ozone.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/testReport/ | | Max. process+thread count | 411 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common hadoop-ozone/common hadoop-ozone/integration-test hadoop-ozone/ozone-manager U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration
[ https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888395#comment-16888395 ] Sean Mackrory commented on HADOOP-13868: Bit-rot after only 2 1/2 years? Imagine that! Actually the only part that doesn't apply cleanly is the documentation, and that's just because it's looking 100 lines away from where it should. Resubmitted as a pull request to verify a clean Yetus run, but as the patch is virtually identical I'll assume your +1 still applies unless I hear otherwise. > New defaults for S3A multi-part configuration > - > > Key: HADOOP-13868 > URL: https://issues.apache.org/jira/browse/HADOOP-13868 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0, 3.0.0-alpha1 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, > optimizing-multipart-s3a.sh > > > I've been looking at a big performance regression when writing to S3 from > Spark that appears to have been introduced with HADOOP-12891. > In the Amazon SDK, the default threshold for multi-part copies is 320x the > threshold for multi-part uploads (and the block size is 20x bigger), so I > don't think it's necessarily wise for us to have them be the same. > I did some quick tests and it seems to me the sweet spot when multi-part > copies start being faster is around 512MB. It wasn't as significant, but > using 104857600 (Amazon's default) for the blocksize was also slightly better. > I propose we do the following, although they're independent decisions: > (1) Split the configuration. Ideally, I'd like to have > fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and > corresponding properties for the block size). But then there's the question > of what to do with the existing fs.s3a.multipart.* properties. Deprecation? > Leave it as a short-hand for configuring both (that's overridden by the more > specific properties?). > (2) Consider increasing the default values. In my tests, 256 MB seemed to be > where multipart uploads came into their own, and 512 MB was where multipart > copies started outperforming the alternative. Would be interested to hear > what other people have seen. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] mackrorysd opened a new pull request #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration
mackrorysd opened a new pull request #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration URL: https://github.com/apache/hadoop/pull/1125 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration
[ https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888392#comment-16888392 ] Hadoop QA commented on HADOOP-13868: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-13868 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13868 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12842566/HADOOP-13868.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/16388/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > New defaults for S3A multi-part configuration > - > > Key: HADOOP-13868 > URL: https://issues.apache.org/jira/browse/HADOOP-13868 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0, 3.0.0-alpha1 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, > optimizing-multipart-s3a.sh > > > I've been looking at a big performance regression when writing to S3 from > Spark that appears to have been introduced with HADOOP-12891. > In the Amazon SDK, the default threshold for multi-part copies is 320x the > threshold for multi-part uploads (and the block size is 20x bigger), so I > don't think it's necessarily wise for us to have them be the same. > I did some quick tests and it seems to me the sweet spot when multi-part > copies start being faster is around 512MB. It wasn't as significant, but > using 104857600 (Amazon's default) for the blocksize was also slightly better. > I propose we do the following, although they're independent decisions: > (1) Split the configuration. Ideally, I'd like to have > fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and > corresponding properties for the block size). But then there's the question > of what to do with the existing fs.s3a.multipart.* properties. Deprecation? > Leave it as a short-hand for configuring both (that's overridden by the more > specific properties?). > (2) Consider increasing the default values. In my tests, 256 MB seemed to be > where multipart uploads came into their own, and 512 MB was where multipart > copies started outperforming the alternative. Would be interested to hear > what other people have seen. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hadoop-yetus commented on issue #1108: HDDS-1805. Implement S3 Initiate MPU request to use Cache and DoubleBuffer.
hadoop-yetus commented on issue #1108: HDDS-1805. Implement S3 Initiate MPU request to use Cache and DoubleBuffer. URL: https://github.com/apache/hadoop/pull/1108#issuecomment-513019346 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 39 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 11 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 16 | Maven dependency ordering for branch | | +1 | mvninstall | 483 | trunk passed | | +1 | compile | 269 | trunk passed | | +1 | checkstyle | 79 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 863 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 166 | trunk passed | | 0 | spotbugs | 321 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 503 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 26 | Maven dependency ordering for patch | | +1 | mvninstall | 451 | the patch passed | | +1 | compile | 269 | the patch passed | | +1 | cc | 269 | the patch passed | | +1 | javac | 269 | the patch passed | | +1 | checkstyle | 87 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 692 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 166 | the patch passed | | +1 | findbugs | 527 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 294 | hadoop-hdds in the patch passed. | | -1 | unit | 1616 | hadoop-ozone in the patch failed. | | +1 | asflicense | 56 | The patch does not generate ASF License warnings. | | | | 6791 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1108 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux cb07dd5ad9f5 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / d5ef38b | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/testReport/ | | Max. process+thread count | 5157 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/common hadoop-ozone/ozone-manager U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration
[ https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888382#comment-16888382 ] Steve Loughran commented on HADOOP-13868: - LGTM +1 > New defaults for S3A multi-part configuration > - > > Key: HADOOP-13868 > URL: https://issues.apache.org/jira/browse/HADOOP-13868 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0, 3.0.0-alpha1 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, > optimizing-multipart-s3a.sh > > > I've been looking at a big performance regression when writing to S3 from > Spark that appears to have been introduced with HADOOP-12891. > In the Amazon SDK, the default threshold for multi-part copies is 320x the > threshold for multi-part uploads (and the block size is 20x bigger), so I > don't think it's necessarily wise for us to have them be the same. > I did some quick tests and it seems to me the sweet spot when multi-part > copies start being faster is around 512MB. It wasn't as significant, but > using 104857600 (Amazon's default) for the blocksize was also slightly better. > I propose we do the following, although they're independent decisions: > (1) Split the configuration. Ideally, I'd like to have > fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and > corresponding properties for the block size). But then there's the question > of what to do with the existing fs.s3a.multipart.* properties. Deprecation? > Leave it as a short-hand for configuring both (that's overridden by the more > specific properties?). > (2) Consider increasing the default values. In my tests, 256 MB seemed to be > where multipart uploads came into their own, and 512 MB was where multipart > copies started outperforming the alternative. Would be interested to hear > what other people have seen. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
[ https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888377#comment-16888377 ] Hudson commented on HADOOP-16437: - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16955 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16955/]) HADOOP-16437 documentation typo fix: fs.s3a.experimental.input.fadvise (stevel: rev d545f9c2903fe63f44c1330d9ce55a85de93804f) * (edit) hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstreambuilder.md * (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md * (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md > Documentation typos: fs.s3a.experimental.fadvise -> > fs.s3a.experimental.input.fadvise > - > > Key: HADOOP-16437 > URL: https://issues.apache.org/jira/browse/HADOOP-16437 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Affects Versions: 3.2.0, 3.3.0, 3.1.2 >Reporter: Josh Rosen >Priority: Major > Fix For: 3.3.0 > > > The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I > believe this is a typo: the actual configuration key that gets read is > {{fs.s3a.experimental.input.fadvise}}. > I'll submit a PR to fix this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran merged pull request #1117: HADOOP-16437 documentation typo fix: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
steveloughran merged pull request #1117: HADOOP-16437 documentation typo fix: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise URL: https://github.com/apache/hadoop/pull/1117 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
[ https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888374#comment-16888374 ] Steve Loughran commented on HADOOP-16437: - thanks. Applied. We're probably going to have to backport this a long way aren't we. And probably everywhere else I've written. There's another thing we could do here, given the docs are out and about: we actually add the other entry as a deprecated key. That way people who ask for it, get it + a warning. Thoughts? (oh, and now we have a better openFile() command, I do actually want to make seek policy a standard option we could implement in all the stores consistently, so that ORC/Parquet code can know to ask for it) > Documentation typos: fs.s3a.experimental.fadvise -> > fs.s3a.experimental.input.fadvise > - > > Key: HADOOP-16437 > URL: https://issues.apache.org/jira/browse/HADOOP-16437 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Affects Versions: 3.2.0, 3.3.0, 3.1.2 >Reporter: Josh Rosen >Priority: Major > Fix For: 3.3.0 > > > The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I > believe this is a typo: the actual configuration key that gets read is > {{fs.s3a.experimental.input.fadvise}}. > I'll submit a PR to fix this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305137102 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot installation.");
[jira] [Updated] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
[ https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-16437: Fix Version/s: 3.3.0 > Documentation typos: fs.s3a.experimental.fadvise -> > fs.s3a.experimental.input.fadvise > - > > Key: HADOOP-16437 > URL: https://issues.apache.org/jira/browse/HADOOP-16437 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Affects Versions: 3.2.0, 3.3.0, 3.1.2 >Reporter: Josh Rosen >Priority: Major > Fix For: 3.3.0 > > > The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I > believe this is a typo: the actual configuration key that gets read is > {{fs.s3a.experimental.input.fadvise}}. > I'll submit a PR to fix this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
[ https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-16437: Affects Version/s: 3.3.0 3.2.0 3.1.2 > Documentation typos: fs.s3a.experimental.fadvise -> > fs.s3a.experimental.input.fadvise > - > > Key: HADOOP-16437 > URL: https://issues.apache.org/jira/browse/HADOOP-16437 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Affects Versions: 3.2.0, 3.3.0, 3.1.2 >Reporter: Josh Rosen >Priority: Major > > The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I > believe this is a typo: the actual configuration key that gets read is > {{fs.s3a.experimental.input.fadvise}}. > I'll submit a PR to fix this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
[ https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-16437: Component/s: fs/s3 > Documentation typos: fs.s3a.experimental.fadvise -> > fs.s3a.experimental.input.fadvise > - > > Key: HADOOP-16437 > URL: https://issues.apache.org/jira/browse/HADOOP-16437 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Reporter: Josh Rosen >Priority: Major > > The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I > believe this is a typo: the actual configuration key that gets read is > {{fs.s3a.experimental.input.fadvise}}. > I'll submit a PR to fix this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1117: HADOOP-16437 documentation typo fix: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
steveloughran commented on issue #1117: HADOOP-16437 documentation typo fix: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise URL: https://github.com/apache/hadoop/pull/1117#issuecomment-513008816 +1 ,committed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305134924 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot installation.");
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305135149 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305134884 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot installation.");
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305134992 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer ratisServer) { ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true) .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build(); this.executorService = HadoopExecutors.newSingleThreadExecutor(build); +this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor(); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305113822 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java ## @@ -0,0 +1,193 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.hadoop.ozone.om; + +import org.apache.commons.lang3.RandomStringUtils; +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.MiniOzoneCluster; +import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneBucket; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.client.OzoneVolume; +import org.apache.hadoop.ozone.client.VolumeArgs; +import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs; +import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer; +import org.apache.hadoop.utils.db.DBCheckpoint; +import org.apache.hadoop.utils.db.Table; +import org.apache.hadoop.utils.db.TableIterator; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; + +import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey; + +/** + * Tests the Ratis snaphsots feature in OM. + */ +public class TestOMRatisSnapshots { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private int numOfOMs = 3; + private static final long SNAPSHOT_THRESHOLD = 50; + private static final int LOG_PURGE_GAP = 50; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(3000_000); + + /** + * Create a MiniDFSCluster for testing. The cluster initially has one + * inactive OM. So at the start of the cluster, there will be 2 active and 1 + * inactive OM. + * + * @throws IOException + */ + @Before + public void init() throws Exception { +conf = new OzoneConfiguration(); +clusterId = UUID.randomUUID().toString(); +scmId = UUID.randomUUID().toString(); +conf.setLong( +OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY, +SNAPSHOT_THRESHOLD); +conf.setInt(OMConfigKeys.OZONE_OM_RATIS_LOG_PURGE_GAP, LOG_PURGE_GAP); +cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf) +.setClusterId(clusterId) +.setScmId(scmId) +.setOMServiceId("om-service-test1") +.setNumOfOzoneManagers(numOfOMs) +.setNumOfActiveOMs(2) +.build(); +cluster.waitForClusterToBeReady(); +objectStore = OzoneClientFactory.getRpcClient(conf).getObjectStore(); + } + + /** + * Shutdown MiniDFSCluster. + */ + @After + public void shutdown() { +if (cluster != null) { + cluster.shutdown(); +} + } + + @Test + public void testInstallSnapshot() throws Exception { +// Get the leader OM +String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider() +.getCurrentProxyOMNodeId(); +OzoneManager leaderOM = cluster.getOzoneManager(leaderOMNodeId); +OzoneManagerRatisServer leaderRatisServer = leaderOM.getOmRatisServer(); + +// Find the inactive OM +String followerNodeId = leaderOM.getPeerNodes().get(0).getOMNodeId(); +if (cluster.isOMActive(followerNodeId)) { + followerNodeId = leaderOM.getPeerNodes().get(1).getOMNodeId(); +} +OzoneManager followerOM = cluster.getOzoneManager(followerNodeId); + +// Do some transactions so that the log index increases +String userName = "user" + RandomStringUtils.randomNumeric(5); +String adminName = "admin" + RandomStringUtils.randomNumeric(5); +String volumeName = "volume" + RandomStringUtils.randomNumeric(5); +
[GitHub] [hadoop] hadoop-yetus commented on issue #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou…
hadoop-yetus commented on issue #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou… URL: https://github.com/apache/hadoop/pull/1067#issuecomment-512994540 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 76 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | 0 | shelldocs | 1 | Shelldocs was not available. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 71 | Maven dependency ordering for branch | | +1 | mvninstall | 518 | trunk passed | | +1 | compile | 315 | trunk passed | | +1 | checkstyle | 88 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 871 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 183 | trunk passed | | 0 | spotbugs | 342 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 558 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 41 | Maven dependency ordering for patch | | +1 | mvninstall | 497 | the patch passed | | +1 | compile | 321 | the patch passed | | +1 | javac | 321 | the patch passed | | +1 | checkstyle | 93 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | shellcheck | 1 | There were no new shellcheck issues. | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 814 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 218 | the patch passed | | +1 | findbugs | 653 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 125 | hadoop-hdds in the patch failed. | | -1 | unit | 2095 | hadoop-ozone in the patch failed. | | +1 | asflicense | 51 | The patch does not generate ASF License warnings. | | | | 7870 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.lock.TestLockManager | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.ozone.client.rpc.TestWatchForCommit | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1067 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle shellcheck shelldocs | | uname | Linux 7fc322282b06 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9838a47 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/testReport/ | | Max. process+thread count | 4862 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/tools hadoop-ozone/dist U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/console | | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hadoop] hadoop-yetus commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket request to use Cache and DoubleBuffer.
hadoop-yetus commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket request to use Cache and DoubleBuffer. URL: https://github.com/apache/hadoop/pull/1097#issuecomment-512990741 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 40 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 5 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 14 | Maven dependency ordering for branch | | +1 | mvninstall | 472 | trunk passed | | +1 | compile | 258 | trunk passed | | +1 | checkstyle | 74 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 847 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 147 | trunk passed | | 0 | spotbugs | 312 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 504 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 16 | Maven dependency ordering for patch | | +1 | mvninstall | 410 | the patch passed | | +1 | compile | 244 | the patch passed | | +1 | javac | 244 | the patch passed | | +1 | checkstyle | 62 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 616 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 148 | the patch passed | | +1 | findbugs | 514 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 286 | hadoop-hdds in the patch failed. | | -1 | unit | 1921 | hadoop-ozone in the patch failed. | | +1 | asflicense | 51 | The patch does not generate ASF License warnings. | | | | 6769 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.container.placement.algorithms.TestContainerPlacementFactory | | | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1097 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4013f1963c43 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9838a47 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/testReport/ | | Max. process+thread count | 5285 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/common hadoop-ozone/ozone-manager U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API. URL: https://github.com/apache/hadoop/pull/1033#discussion_r305115724 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/RDBStore.java ## @@ -318,6 +320,44 @@ public CodecRegistry getCodecRegistry() { return codecRegistry; } + @Override + public DBUpdatesWrapper getUpdatesSince(long sequenceNumber) + throws DataNotFoundException { + +DBUpdatesWrapper dbUpdatesWrapper = new DBUpdatesWrapper(); +try { + TransactionLogIterator transactionLogIterator = + db.getUpdatesSince(sequenceNumber); + + boolean flag = true; + + while (transactionLogIterator.isValid()) { Review comment: Does this imply flush to sst can happen while iterating the log? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305113822 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java ## @@ -0,0 +1,193 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.hadoop.ozone.om; + +import org.apache.commons.lang3.RandomStringUtils; +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.MiniOzoneCluster; +import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneBucket; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.client.OzoneVolume; +import org.apache.hadoop.ozone.client.VolumeArgs; +import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs; +import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer; +import org.apache.hadoop.utils.db.DBCheckpoint; +import org.apache.hadoop.utils.db.Table; +import org.apache.hadoop.utils.db.TableIterator; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; + +import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey; + +/** + * Tests the Ratis snaphsots feature in OM. + */ +public class TestOMRatisSnapshots { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private int numOfOMs = 3; + private static final long SNAPSHOT_THRESHOLD = 50; + private static final int LOG_PURGE_GAP = 50; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(3000_000); + + /** + * Create a MiniDFSCluster for testing. The cluster initially has one + * inactive OM. So at the start of the cluster, there will be 2 active and 1 + * inactive OM. + * + * @throws IOException + */ + @Before + public void init() throws Exception { +conf = new OzoneConfiguration(); +clusterId = UUID.randomUUID().toString(); +scmId = UUID.randomUUID().toString(); +conf.setLong( +OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY, +SNAPSHOT_THRESHOLD); +conf.setInt(OMConfigKeys.OZONE_OM_RATIS_LOG_PURGE_GAP, LOG_PURGE_GAP); +cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf) +.setClusterId(clusterId) +.setScmId(scmId) +.setOMServiceId("om-service-test1") +.setNumOfOzoneManagers(numOfOMs) +.setNumOfActiveOMs(2) +.build(); +cluster.waitForClusterToBeReady(); +objectStore = OzoneClientFactory.getRpcClient(conf).getObjectStore(); + } + + /** + * Shutdown MiniDFSCluster. + */ + @After + public void shutdown() { +if (cluster != null) { + cluster.shutdown(); +} + } + + @Test + public void testInstallSnapshot() throws Exception { +// Get the leader OM +String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider() +.getCurrentProxyOMNodeId(); +OzoneManager leaderOM = cluster.getOzoneManager(leaderOMNodeId); +OzoneManagerRatisServer leaderRatisServer = leaderOM.getOmRatisServer(); + +// Find the inactive OM +String followerNodeId = leaderOM.getPeerNodes().get(0).getOMNodeId(); +if (cluster.isOMActive(followerNodeId)) { + followerNodeId = leaderOM.getPeerNodes().get(1).getOMNodeId(); +} +OzoneManager followerOM = cluster.getOzoneManager(followerNodeId); + +// Do some transactions so that the log index increases +String userName = "user" + RandomStringUtils.randomNumeric(5); +String adminName = "admin" + RandomStringUtils.randomNumeric(5); +String volumeName = "volume" + RandomStringUtils.randomNumeric(5); +
[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API. URL: https://github.com/apache/hadoop/pull/1033#discussion_r305113051 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DataNotFoundException.java ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.hadoop.utils.db; + +import java.io.IOException; + +/** + * Thrown if RocksDB is unable to find requested data from WAL file. + */ +public class DataNotFoundException extends IOException { Review comment: or just SequenceNumberNotFoundException? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305113075 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot installation.");
[GitHub] [hadoop] hadoop-yetus commented on issue #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes
hadoop-yetus commented on issue #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes URL: https://github.com/apache/hadoop/pull/1120#issuecomment-512985848 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 72 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 504 | trunk passed | | +1 | compile | 262 | trunk passed | | +1 | checkstyle | 72 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 956 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 176 | trunk passed | | 0 | spotbugs | 364 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 601 | trunk passed | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 449 | the patch passed | | +1 | compile | 275 | the patch passed | | +1 | javac | 275 | the patch passed | | +1 | checkstyle | 79 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 754 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 177 | the patch passed | | +1 | findbugs | 620 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 346 | hadoop-hdds in the patch failed. | | -1 | unit | 2070 | hadoop-ozone in the patch failed. | | +1 | asflicense | 56 | The patch does not generate ASF License warnings. | | | | 7621 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.block.TestBlockManager | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.TestStorageContainerManager | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.hdds.scm.pipeline.TestSCMPipelineManager | | | hadoop.ozone.client.rpc.TestWatchForCommit | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.7 Server=18.09.7 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1120 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 61e5691f22ee 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9838a47 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/testReport/ | | Max. process+thread count | 4659 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API. URL: https://github.com/apache/hadoop/pull/1033#discussion_r305112491 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DataNotFoundException.java ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.hadoop.utils.db; + +import java.io.IOException; + +/** + * Thrown if RocksDB is unable to find requested data from WAL file. + */ +public class DataNotFoundException extends IOException { Review comment: The exception seems too broad, can be DataNotFoundFroSequenceNumberException? The message can, therefore, ask for the sequence number and client-side code can log it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API. URL: https://github.com/apache/hadoop/pull/1033#discussion_r305112491 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DataNotFoundException.java ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.hadoop.utils.db; + +import java.io.IOException; + +/** + * Thrown if RocksDB is unable to find requested data from WAL file. + */ +public class DataNotFoundException extends IOException { Review comment: The exception seems too broad, can be DataNotFoundForSequenceNumberException? The message can, therefore, ask for the sequence number and client-side code can log it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305111915 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot installation.");
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305108487 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java ## @@ -0,0 +1,193 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.hadoop.ozone.om; + +import org.apache.commons.lang3.RandomStringUtils; +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.MiniOzoneCluster; +import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneBucket; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.client.OzoneVolume; +import org.apache.hadoop.ozone.client.VolumeArgs; +import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs; +import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer; +import org.apache.hadoop.utils.db.DBCheckpoint; +import org.apache.hadoop.utils.db.Table; +import org.apache.hadoop.utils.db.TableIterator; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; + +import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey; + +/** + * Tests the Ratis snaphsots feature in OM. + */ +public class TestOMRatisSnapshots { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private int numOfOMs = 3; + private static final long SNAPSHOT_THRESHOLD = 50; + private static final int LOG_PURGE_GAP = 50; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(3000_000); + + /** + * Create a MiniDFSCluster for testing. The cluster initially has one Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-512981397 > **Question:** > In ShouldInstallSnapshot, it calls getLatestSnapshot() from stateMachineStorage, as we have our own snapshot implementation in stateMachine, do we need to override that method to provide correct snapshotInfo? Or could you provide some info how this works? We do not want the snapshots to be handled via Ratis. When a follower receives installSnaphsot notification, it sends the new loaded DB's snapshot index back to the leader. The leader updates the followers snaphsot index through this. But when Ratis server is starting up, it should be able to determine the latest snapshot index. Otherwise, all the logs will be replayed from the start. I will create a new Jira to address this. Thanks Bharat. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305104705 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer ratisServer) { ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true) .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build(); this.executorService = HadoopExecutors.newSingleThreadExecutor(build); +this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor(); } /** * Initializes the State Machine with the given server, group and storage. * TODO: Load the latest snapshot from the file system. Review comment: Correction: On reloading state, we should not read the saved snaphsot index. Instead, we should updated the snapshot index on disk. During normal startup, we already read the saved snapshot index. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] xiaoyuyao merged pull request #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes
xiaoyuyao merged pull request #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes URL: https://github.com/apache/hadoop/pull/1120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] xiaoyuyao commented on issue #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes
xiaoyuyao commented on issue #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes URL: https://github.com/apache/hadoop/pull/1120#issuecomment-512974489 +1. Thanks @adoroszlai for fixing this. I will merge this shortly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16311) Hadoop build failure - natively on ARM (armv7) - oom_listener_main.c issues
[ https://issues.apache.org/jira/browse/HADOOP-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888027#comment-16888027 ] AiBe Gee edited comment on HADOOP-16311 at 7/18/19 8:24 PM: [~tonyharvey] Sorry for the late reply, haven't checked the E-Mail I used for my Jira registration until today. I put this on resolved because I learned that it was a duplicate: https://issues.apache.org/jira/browse/YARN-8498 and not because I managed to resolve it. Unfortunately, the patches presented in: https://issues.apache.org/jira/browse/YARN-8498 are not working and I went for Hadoop 2.9.2, which I was able to build successfully on Pi2B. Same for HBase - I picked 1.4.9, Hive - 2.3.5 and Phoenix 4.14.2 The latest versions for the packages I presented above don't seem to work on ARM, the maven build scripts are downloading some X86 stuff and the builds are failing. See: https://issues.apache.org/jira/browse/HADOOP-16309 Off-Topic, just to help you, these are my notes for the hadoop 2.9.2 build on Raspberry Pi2 ARMv7 using Slackware Linux 14.2: - protobuf 2.5.0 required ! wget [https://github.com/apache/hadoop/archive/rel/release-2.9.2.tar.gz] tar -xzpf release-2.9.2.tar.gz cd hadoop-rel-release-2.9.2/ - swap - using external HDD swapoff /dev/whatever-partition-is-actually-the-swap mkswap /dev/sda1 swapon /dev/sda1 echo 1 > /proc/sys/vm/swappiness - Environment: export PATH="$PATH:/opt/java/bin" export M2_HOME=/opt/apache-maven-3.6.1 export "PATH=$PATH:$M2_HOME/bin" JAVA_HOME=/opt/java export JAVA_HOME export ARCH=arm export CFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mvectorize-with-neon-quad -mfloat-abi=hard" export CXXFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mvectorize-with-neon-quad -mfloat-abi=hard" export CPPFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mvectorize-with-neon-quad -mfloat-abi=hard" export MAKEFLAGS="-j 3" - need to patch pom.xml - add to pom.xml: org.apache.maven.plugins maven-surefire-plugin 3.0.0-M3 false - needs this too - only on ARM: https://issues.apache.org/jira/browse/HADOOP-9320- patch - v2.8.patch cd /kit/hadoop-rel-release-2.9.2/hadoop-common-project/hadoop-common/ wget https://patch-diff.githubusercontent.com/raw/apache/hadoop/pull/224.patch patch < 224.patch - this patch too: https://issues.apache.org/jira/browse/HADOOP-14597 cd /kit/hadoop-rel-release-2.9.2/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/ wget [^HADOOP-14597.04.patch] patch < HADOOP-14597.04.patch cd /kit/hadoop-rel-release-2.9.2/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/ wget [^HADOOP-14597.04.patch] patch < HADOOP-14597.04.patch Build: cd /kit/hadoop-rel-release-2.9.2/ nohup mvn package -Pdist,native,docs -DskipTests -Dtar 2>&1 | tee hadoop-2-9-2-build.log cp /kit/hadoop-rel-release-2.9.2/hadoop-dist/target/hadoop-2.9.2.tar.gz /kit/ Hope it helps. P.S. Edit - still profoundly horrified/disgusted over how jira works - autoformatting, worse than Redmond Word! I have edited my post several times and there are still some links to some patches broken, sorry, I lost patience correcting all the automated crap. A pity using this impossible tool for such a great project like hadoop ... was (Author: abga): [~tonyharvey] Sorry for the late reply, haven't checked the E-Mail I used for my Jira registration until today. I put this on resolved because I learned that it was a duplicate: https://issues.apache.org/jira/browse/YARN-8498 and not because I managed to resolve it. Unfortunately, the patches presented in: https://issues.apache.org/jira/browse/YARN-8498 are not working and I went for Hadoop 2.9.2, which I was able to build successfully on Pi2B. Same for HBase - I picked 1.4.9, Hive - 2.3.5 and Phoenix 4.14.2 The latest versions for the packages I presented above don't seem to work on ARM, the maven build scripts are downloading some X86 stuff and the builds are failing. See: https://issues.apache.org/jira/browse/HADOOP-16309 Off-Topic, just to help you, these are my notes for the hadoop 2.9.2 build on Raspberry Pi2 ARMv7 using Slackware Linux 14.2: - protobuf 2.5.0 required ! wget [https://github.com/apache/hadoop/archive/rel/release-2.9.2.tar.gz] tar -xzpf release-2.9.2.tar.gz cd hadoop-rel-release-2.9.2/ - swap - using external HDD swapoff /dev/whatever-partition-is-actually-the-swap mkswap /dev/sda1 swapon /dev/sda1 echo 1 > /proc/sys/vm/swappiness - Environment: export PATH="$PATH:/opt/java/bin"export M2_HOME=/opt/apache-maven-3.6.1 export "PATH=$PATH:$M2_HOME/bin"JAVA_HOME=/opt/java export JAVA_HOME export ARCH=arm export CFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -mvectorize-with-neon-quad -mfloat-abi=hard" export CXXFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4
[GitHub] [hadoop] hadoop-yetus commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…
hadoop-yetus commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli… URL: https://github.com/apache/hadoop/pull/1124#issuecomment-512967762 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 36 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 466 | trunk passed | | +1 | compile | 265 | trunk passed | | +1 | checkstyle | 73 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 868 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 146 | trunk passed | | 0 | spotbugs | 310 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 501 | trunk passed | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 443 | the patch passed | | +1 | compile | 240 | the patch passed | | +1 | javac | 240 | the patch passed | | +1 | checkstyle | 67 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 625 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 152 | the patch passed | | +1 | findbugs | 513 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 281 | hadoop-hdds in the patch failed. | | -1 | unit | 1480 | hadoop-ozone in the patch failed. | | +1 | asflicense | 51 | The patch does not generate ASF License warnings. | | | | 6367 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.container.placement.algorithms.TestContainerPlacementFactory | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.ozone.container.server.TestSecureContainerServer | | | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer | | | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider | | | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis | | | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory | | | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient | | | hadoop.ozone.client.rpc.TestFailureHandlingByClient | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.client.rpc.TestCommitWatcher | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1124 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9df3745bafda 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9838a47 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/testReport/ | | Max. process+thread count | 5170 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/client U: hadoop-hdds/client | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket request to use Cache and DoubleBuffer.
bharatviswa504 commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket request to use Cache and DoubleBuffer. URL: https://github.com/apache/hadoop/pull/1097#issuecomment-512952595 This is ready for review. Rebased with the latest trunk, as now HDDS-1689 got checked in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] adoroszlai commented on a change in pull request #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes
adoroszlai commented on a change in pull request #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes URL: https://github.com/apache/hadoop/pull/1120#discussion_r305071450 ## File path: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestContainerPlacementFactory.java ## @@ -75,50 +69,25 @@ public void setup() { // Totally 3 racks, each has 5 datanodes DatanodeDetails node = TestUtils.createDatanodeDetails( hostname + i, rack + (i / 5)); - datanodes.add(node); cluster.add(node); } // create mock node manager -nodeManager = Mockito.mock(NodeManager.class); -when(nodeManager.getNodes(NodeState.HEALTHY)) -.thenReturn(new ArrayList<>(datanodes)); -when(nodeManager.getNodeStat(anyObject())) -.thenReturn(new SCMNodeMetric(storageCapacity, 0L, 100L)); -when(nodeManager.getNodeStat(datanodes.get(2))) -.thenReturn(new SCMNodeMetric(storageCapacity, 90L, 10L)); -when(nodeManager.getNodeStat(datanodes.get(3))) -.thenReturn(new SCMNodeMetric(storageCapacity, 80L, 20L)); -when(nodeManager.getNodeStat(datanodes.get(4))) -.thenReturn(new SCMNodeMetric(storageCapacity, 70L, 30L)); - } - +NodeManager nodeManager = Mockito.mock(NodeManager.class); - @Test - public void testDefaultPolicy() throws IOException { ContainerPlacementPolicy policy = ContainerPlacementPolicyFactory .getPolicy(conf, nodeManager, cluster, true); - Review comment: Thanks @xiaoyuyao for the suggestion, it's implemented in the latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hadoop-yetus commented on issue #1123: HADOOP-16380 tombstones
hadoop-yetus commented on issue #1123: HADOOP-16380 tombstones URL: https://github.com/apache/hadoop/pull/1123#issuecomment-512944956 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 82 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 4 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 89 | Maven dependency ordering for branch | | +1 | mvninstall | 1257 | trunk passed | | +1 | compile | 1222 | trunk passed | | +1 | checkstyle | 143 | trunk passed | | +1 | mvnsite | 127 | trunk passed | | +1 | shadedclient | 983 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 97 | trunk passed | | 0 | spotbugs | 65 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 185 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 23 | Maven dependency ordering for patch | | +1 | mvninstall | 78 | the patch passed | | +1 | compile | 1266 | the patch passed | | +1 | javac | 1266 | the patch passed | | +1 | checkstyle | 140 | the patch passed | | +1 | mvnsite | 122 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 717 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 99 | the patch passed | | +1 | findbugs | 199 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 528 | hadoop-common in the patch passed. | | +1 | unit | 289 | hadoop-aws in the patch passed. | | +1 | asflicense | 47 | The patch does not generate ASF License warnings. | | | | 7684 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1123/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1123 | | Optional Tests | dupname asflicense mvnsite compile javac javadoc mvninstall unit shadedclient findbugs checkstyle | | uname | Linux 6b218744fdc8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9838a47 | | Default Java | 1.8.0_212 | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1123/1/testReport/ | | Max. process+thread count | 1387 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1123/1/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] adoroszlai removed a comment on issue #1122: YARN-9679. Regular code cleanup in TestResourcePluginManager
adoroszlai removed a comment on issue #1122: YARN-9679. Regular code cleanup in TestResourcePluginManager URL: https://github.com/apache/hadoop/pull/1122#issuecomment-512936966 rebuild This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] adoroszlai commented on issue #1122: YARN-9679. Regular code cleanup in TestResourcePluginManager
adoroszlai commented on issue #1122: YARN-9679. Regular code cleanup in TestResourcePluginManager URL: https://github.com/apache/hadoop/pull/1122#issuecomment-512936966 rebuild This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305056316 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java ## @@ -0,0 +1,193 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.hadoop.ozone.om; + +import org.apache.commons.lang3.RandomStringUtils; +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.MiniOzoneCluster; +import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneBucket; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.client.OzoneVolume; +import org.apache.hadoop.ozone.client.VolumeArgs; +import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs; +import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer; +import org.apache.hadoop.utils.db.DBCheckpoint; +import org.apache.hadoop.utils.db.Table; +import org.apache.hadoop.utils.db.TableIterator; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; + +import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey; + +/** + * Tests the Ratis snaphsots feature in OM. + */ +public class TestOMRatisSnapshots { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private int numOfOMs = 3; + private static final long SNAPSHOT_THRESHOLD = 50; + private static final int LOG_PURGE_GAP = 50; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(3000_000); + + /** + * Create a MiniDFSCluster for testing. The cluster initially has one + * inactive OM. So at the start of the cluster, there will be 2 active and 1 + * inactive OM. + * + * @throws IOException + */ + @Before + public void init() throws Exception { +conf = new OzoneConfiguration(); +clusterId = UUID.randomUUID().toString(); +scmId = UUID.randomUUID().toString(); +conf.setLong( +OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY, +SNAPSHOT_THRESHOLD); +conf.setInt(OMConfigKeys.OZONE_OM_RATIS_LOG_PURGE_GAP, LOG_PURGE_GAP); +cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf) +.setClusterId(clusterId) +.setScmId(scmId) +.setOMServiceId("om-service-test1") +.setNumOfOzoneManagers(numOfOMs) +.setNumOfActiveOMs(2) +.build(); +cluster.waitForClusterToBeReady(); +objectStore = OzoneClientFactory.getRpcClient(conf).getObjectStore(); + } + + /** + * Shutdown MiniDFSCluster. + */ + @After + public void shutdown() { +if (cluster != null) { + cluster.shutdown(); +} + } + + @Test + public void testInstallSnapshot() throws Exception { +// Get the leader OM +String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider() +.getCurrentProxyOMNodeId(); +OzoneManager leaderOM = cluster.getOzoneManager(leaderOMNodeId); +OzoneManagerRatisServer leaderRatisServer = leaderOM.getOmRatisServer(); + +// Find the inactive OM +String followerNodeId = leaderOM.getPeerNodes().get(0).getOMNodeId(); +if (cluster.isOMActive(followerNodeId)) { + followerNodeId = leaderOM.getPeerNodes().get(1).getOMNodeId(); +} +OzoneManager followerOM = cluster.getOzoneManager(followerNodeId); + +// Do some transactions so that the log index increases +String userName = "user" + RandomStringUtils.randomNumeric(5); +String adminName = "admin" + RandomStringUtils.randomNumeric(5); +String volumeName = "volume" + RandomStringUtils.randomNumeric(5); +
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305053446 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305053446 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] avijayanhwx commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…
avijayanhwx commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli… URL: https://github.com/apache/hadoop/pull/1124#issuecomment-512929603 /label ozone This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305052177 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] avijayanhwx opened a new pull request #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…
avijayanhwx opened a new pull request #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli… URL: https://github.com/apache/hadoop/pull/1124 …ne for reads. Currently the list of nodes returned by SCM are static and are returned in the same order to all the clients. Ideally these should be sorted by the network topology and then returned to client. However even when network topology in not available, then SCM/client should randomly sort the nodes before choosing the replica's to connect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] goiri commented on issue #1040: HDFS-13693. Remove unnecessary search in INodeDirectory.addChild during image loa…
goiri commented on issue #1040: HDFS-13693. Remove unnecessary search in INodeDirectory.addChild during image loa… URL: https://github.com/apache/hadoop/pull/1040#issuecomment-512928429 The parallel life of JIRAs and PRs is driving me a little crazy. We have both the patch and the diff here and then we also have comments in both. Anyway, He Xiaoqiao seems to have comments in the JIRA. It would be good to get his +1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305048987 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305047773 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305047773 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305046862 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305044258 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java ## @@ -159,15 +159,18 @@ public void decNumKeys() { } public void setNumVolumes(long val) { -this.numVolumes.incr(val); +long oldVal = this.numVolumes.value(); +this.numVolumes.incr(val - oldVal); Review comment: Got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305041874 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, Review comment: If checkpointSnapshotIndex <= lastAppliedIndex, I think here we need to clean up the DB checkpoint which is downloaded This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305041731 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java ## @@ -159,15 +159,18 @@ public void decNumKeys() { } public void setNumVolumes(long val) { -this.numVolumes.incr(val); +long oldVal = this.numVolumes.value(); +this.numVolumes.incr(val - oldVal); Review comment: Lets say numVolumes = 10. After that the OM is restarted or state is reloaded with a new DB checkpoint. Not the num of volumes in the VolumeTable is 20. If we increment the numVolumes metric by 20, then the metrics will show total num of volumes to be 30 whereas it should be only 20. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305039831 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1630,6 +1630,14 @@ Byte limit for Raft's Log Worker queue. + +ozone.om.ratis.log.purge.gap +1024 +OZONE, OM, RATIS +The minimum gap between log indices for Raft server to purge Review comment: Agree. Will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305039875 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java ## @@ -159,15 +159,18 @@ public void decNumKeys() { } public void setNumVolumes(long val) { -this.numVolumes.incr(val); +long oldVal = this.numVolumes.value(); +this.numVolumes.incr(val - oldVal); Review comment: Then why do we do this this.numVolumes.incr(val - oldVal); Sorry still not got it why are we doing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305039471 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1630,6 +1630,14 @@ Byte limit for Raft's Log Worker queue. + +ozone.om.ratis.log.purge.gap +1024 +OZONE, OM, RATIS +The minimum gap between log indices for Raft server to purge Review comment: Agree. Will update it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305038252 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer ratisServer) { ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true) .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build(); this.executorService = HadoopExecutors.newSingleThreadExecutor(build); +this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor(); Review comment: Shutdown of this executor needs to be done in StateMachine stop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305035508 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -190,6 +197,27 @@ public TransactionContext startTransaction( } } + @Override + public void pause() { +lifeCycle.transition(LifeCycle.State.PAUSING); Review comment: It is taken care of internally by Ratis. The StateMachineUpdater in Ratis checks if the state is RUNNING before applying log entries to StateMachine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305032543 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java ## @@ -159,15 +159,18 @@ public void decNumKeys() { } public void setNumVolumes(long val) { -this.numVolumes.incr(val); +long oldVal = this.numVolumes.value(); +this.numVolumes.incr(val - oldVal); Review comment: The setNumVolumes is called with the total number of rows in VolumeTable. This is not the difference between the old value and new value. metrics.setNumVolumes(metadataManager.countRowsInTable(metadataManager .getVolumeTable())); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305030842 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer ratisServer) { ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true) .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build(); this.executorService = HadoopExecutors.newSingleThreadExecutor(build); +this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor(); } /** * Initializes the State Machine with the given server, group and storage. * TODO: Load the latest snapshot from the file system. Review comment: Yes thanks for catching this. On startup, we should read the saved ratis snapshot index from disk. I will update the patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305030343 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1630,6 +1630,14 @@ Byte limit for Raft's Log Worker queue. + +ozone.om.ratis.log.purge.gap +1024 +OZONE, OM, RATIS +The minimum gap between log indices for Raft server to purge Review comment: 1024 transactions is 100ms worth of edits in a busy cluster. We could set this as high as 1M maybe to keep more history. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305029959 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1630,6 +1630,14 @@ Byte limit for Raft's Log Worker queue. + +ozone.om.ratis.log.purge.gap +1024 +OZONE, OM, RATIS +The minimum gap between log indices for Raft server to purge Review comment: Let's set this to a higher value. We don't need to be too aggressive about purging Ratis logs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305029702 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -1223,6 +1231,14 @@ public void start() throws IOException { DefaultMetricsSystem.initialize("OzoneManager"); +// Start Ratis services +if (omRatisServer != null) { Review comment: If ratis is not enabled. This is for the non-HA code path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-512908324 I am mostly +1 on this change. Couple of minor comments and one thing I requested Bharat to double check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305028836 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -1630,6 +1630,14 @@ Byte limit for Raft's Log Worker queue. + +ozone.om.ratis.log.purge.gap +1024 +OZONE, OM, RATIS +The minimum gap between log indices for Raft server to purge Review comment: No, when a snapshot is being taken, if the gap between log purges is more than 1024, then it will purge the logs. Snapshot frequency is not dependent on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305019890 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -190,6 +197,27 @@ public TransactionContext startTransaction( } } + @Override + public void pause() { +lifeCycle.transition(LifeCycle.State.PAUSING); Review comment: We have set the lifeCycle State here, but I don't see how this will pause stateMachine. As this state is not being used anywhere else except during initliaze and unpause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#issuecomment-512907214 **Question:** In ShouldInstallSnapshot, it calls getLatestSnapshot() from stateMachineStorage, as we have our own snapshot implementation in stateMachine, do we need to override that method to provide correct snapshotInfo? Or could you provide some info how this works? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r304690480 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java ## @@ -159,15 +159,18 @@ public void decNumKeys() { } public void setNumVolumes(long val) { -this.numVolumes.incr(val); +long oldVal = this.numVolumes.value(); +this.numVolumes.incr(val - oldVal); Review comment: Not understood, why are we subtracting here, after reload? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305027589 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java ## @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer ratisServer) { ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true) .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build(); this.executorService = HadoopExecutors.newSingleThreadExecutor(build); +this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor(); } /** * Initializes the State Machine with the given server, group and storage. * TODO: Load the latest snapshot from the file system. Review comment: This TODO looks a little worrying. Something we need to address now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r304689358 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java ## @@ -0,0 +1,193 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.hadoop.ozone.om; + +import org.apache.commons.lang3.RandomStringUtils; +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.MiniOzoneCluster; +import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneBucket; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.client.OzoneVolume; +import org.apache.hadoop.ozone.client.VolumeArgs; +import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs; +import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer; +import org.apache.hadoop.utils.db.DBCheckpoint; +import org.apache.hadoop.utils.db.Table; +import org.apache.hadoop.utils.db.TableIterator; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.UUID; + +import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey; + +/** + * Tests the Ratis snaphsots feature in OM. + */ +public class TestOMRatisSnapshots { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private int numOfOMs = 3; + private static final long SNAPSHOT_THRESHOLD = 50; + private static final int LOG_PURGE_GAP = 50; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(3000_000); + + /** + * Create a MiniDFSCluster for testing. The cluster initially has one Review comment: Minor: MiniOzoneCluster This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r305026829 ## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ## @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { +if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; +} + +DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId); + +// Check if current ratis log index is smaller than the downloaded +// snapshot index. If yes, proceed by stopping the ratis server so that +// the OM state can be re-initialized. If no, then do not proceed with +// installSnapshot. +long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); +long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); +if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + + "applied index: {} is greater than or equal to the checkpoint's " + + "snapshot index: {}", leaderId, lastAppliedIndex, + checkpointSnapshotIndex); + return null; +} + +// Pause the State Machine so that no new transactions can be applied. +// This action also clears the OM Double Buffer so that if there are any +// pending transactions in the buffer, they are discarded. +// TODO: The Ratis server should also be paused here. This is required +// because a leader election might happen while the snapshot +// installation is in progress and the new leader might start sending +// append log entries to the ratis server. +omRatisServer.getOmStateMachine().pause(); + +try { + replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint); +} catch (Exception e) { + LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " + + "failed.", e); + return null; +} + +// Reload the OM DB store with the new checkpoint. +// Restart (unpause) the state machine and update its last applied index +// to the installed checkpoint's snapshot index. +try { + reloadOMState(); + omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex); +} catch (IOException e) { + LOG.error("Failed to reload OM state with new DB checkpoint.", e); + return null; +} + +// TODO: We should only return the snpashotIndex to the leader. +// Should be fixed after RATIS-586 +TermIndex newTermIndex = TermIndex.newTermIndex(0, +checkpointSnapshotIndex); + +return newTermIndex; + } + + /** + * Download the latest OM DB checkpoint from the leader OM. + * @param leaderId OMNodeID of the leader OM node. + * @return latest DB checkpoint from leader OM. + */ + private DBCheckpoint getDBCheckpointFromLeader(String leaderId) { +LOG.info("Downloading checkpoint from leader OM {} and reloading state " + +"from the checkpoint.", leaderId); + +try { + return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); +} catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); +} +return null; + } + + /** + * Replace the current OM DB with the new DB checkpoint. + * @param lastAppliedIndex the last applied index in the current OM DB. + * @param omDBcheckpoint the new DB checkpoint + * @throws Exception + */ + void replaceOMDBWithCheckpoint(long lastAppliedIndex, + DBCheckpoint omDBcheckpoint) throws Exception { +// Stop the DB first +DBStore store = metadataManager.getStore(); +store.close(); + +// Take a backup of the current DB +File db = store.getDbLocation(); +String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX + +lastAppliedIndex + "_" + System.currentTimeMillis(); +File dbBackup = new File(db.getParentFile(), dbBackupName); + +try { + Files.move(db.toPath(), dbBackup.toPath()); +} catch (IOException e) { + LOG.error("Failed to create a backup of the current DB. Aborting " + + "snapshot installation."); +