[jira] [Commented] (HADOOP-16398) Exports Hadoop metrics to Prometheus

2019-07-18 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888534#comment-16888534
 ] 

Akira Ajisaka commented on HADOOP-16398:


Hi [~elek] and [~anu], would you review this?

After this issue is resolved, I'd like to parse NNTop metrics for Prometheus.

> Exports Hadoop metrics to Prometheus
> 
>
> Key: HADOOP-16398
> URL: https://issues.apache.org/jira/browse/HADOOP-16398
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: metrics
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: HADOOP-16398.001.patch
>
>
> Hadoop common side of HDDS-846. HDDS already have its own 
> PrometheusMetricsSink, so we can reuse the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] Cosss7 opened a new pull request #1129: HDFS-14509 DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-07-18 Thread GitBox
Cosss7 opened a new pull request #1129: HDFS-14509 DN throws InvalidToken due 
to inequality of password when upgrade NN 2.x to 3.x
URL: https://github.com/apache/hadoop/pull/1129
 
 
   reference to jira


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16431) Remove useless log in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888499#comment-16888499
 ] 

Hadoop QA commented on HADOOP-16431:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
36s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-openstack in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}107m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HADOOP-16431 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12975213/HADOOP-16431.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0da43f6c67e5 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d545f9c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[GitHub] [hadoop] Cosss7 opened a new pull request #1128: HDFS-14551 NN throws NPE if downgrade it during rolling upgrade from 3.x to 2.x

2019-07-18 Thread GitBox
Cosss7 opened a new pull request #1128: HDFS-14551 NN throws NPE if downgrade 
it during rolling upgrade from 3.x to 2.x
URL: https://github.com/apache/hadoop/pull/1128
 
 
   reference jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ChenSammi commented on a change in pull request #1112: HDDS-1713. ReplicationManager fail to find proper node topology based…

2019-07-18 Thread GitBox
ChenSammi commented on a change in pull request #1112: HDDS-1713. 
ReplicationManager fail to find proper node topology based…
URL: https://github.com/apache/hadoop/pull/1112#discussion_r305190794
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementRackAware.java
 ##
 @@ -137,10 +137,6 @@ public void chooseNodeWithNoExcludedNodes() throws 
SCMException {
 datanodeDetails.get(2)));
 Assert.assertFalse(cluster.isSameParent(datanodeDetails.get(1),
 datanodeDetails.get(2)));
-Assert.assertFalse(cluster.isSameParent(datanodeDetails.get(0),
-datanodeDetails.get(3)));
-Assert.assertFalse(cluster.isSameParent(datanodeDetails.get(2),
-datanodeDetails.get(3)));
 
 Review comment:
   Thanks for the comments. Will remove last two assertions in testFallback.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305187064
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java
 ##
 @@ -123,6 +123,9 @@ private OMConfigKeys() {
   "ozone.om.ratis.log.appender.queue.byte-limit";
   public static final String
   OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB";
+  public static final String OZONE_OM_RATIS_LOG_PURGE_GAP =
+  "ozone.om.ratis.log.purge.gap";
+  public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100;
 
 
 Review comment:
   Filed [HDDS-1831](https://issues.apache.org/jira/browse/HDDS-1831).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305186358
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java
 ##
 @@ -123,6 +123,9 @@ private OMConfigKeys() {
   "ozone.om.ratis.log.appender.queue.byte-limit";
   public static final String
   OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB";
+  public static final String OZONE_OM_RATIS_LOG_PURGE_GAP =
+  "ozone.om.ratis.log.purge.gap";
+  public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100;
 
 
 Review comment:
   Good suggestion! Let me file a followup jira to fix that. Want to get this 
patch committed today, it's been hanging around for over a month.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ChenSammi merged pull request #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou…

2019-07-18 Thread GitBox
ChenSammi merged pull request #1067: HDDS-1653. Add option to "ozone scmcli 
printTopology" to order the ou…
URL: https://github.com/apache/hadoop/pull/1067
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] ChenSammi commented on issue #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou…

2019-07-18 Thread GitBox
ChenSammi commented on issue #1067: HDDS-1653. Add option to "ozone scmcli 
printTopology" to order the ou…
URL: https://github.com/apache/hadoop/pull/1067#issuecomment-513070699
 
 
   +1, will commit shortly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] xiaoyuyao commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…

2019-07-18 Thread GitBox
xiaoyuyao commented on issue #1124: HDDS-1749 : Ozone Client should randomize 
the list of nodes in pipeli…
URL: https://github.com/apache/hadoop/pull/1124#issuecomment-513068886
 
 
   Randomize is good to balance the load. However, 
   For write, we still must go through the leader (first node). 
   For read, we can only use random optimization for closed container. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] anuengineer commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
anuengineer commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305183995
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java
 ##
 @@ -123,6 +123,9 @@ private OMConfigKeys() {
   "ozone.om.ratis.log.appender.queue.byte-limit";
   public static final String
   OZONE_OM_RATIS_LOG_APPENDER_QUEUE_BYTE_LIMIT_DEFAULT = "32MB";
+  public static final String OZONE_OM_RATIS_LOG_PURGE_GAP =
+  "ozone.om.ratis.log.purge.gap";
+  public static final int OZONE_OM_RATIS_LOG_PURGE_GAP_DEFAULT = 100;
 
 
 Review comment:
   Can we please use the new format for configs? Here are some examples: 
https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16431) Remove useless log in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HADOOP-16431:
-
Priority: Minor  (was: Major)

> Remove useless log in IOUtils.java and ExceptionDiags.java
> --
>
> Key: HADOOP-16431
> URL: https://issues.apache.org/jira/browse/HADOOP-16431
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
> Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch
>
>
> When there is no String Constructor for the exception, we Log a Warn Message, 
> and rethrow the exception. We can change the Log level to TRACE/DEBUG.
> {code:java}
> private static  T wrapWithMessage(
>   T exception, String msg) {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor =
>   clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T) (t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type " +
>  clazz + ": it has no (String) constructor", e);
> return exception;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16431) Remove useless log in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HADOOP-16431:
-
Summary: Remove useless log in IOUtils.java and ExceptionDiags.java  (was: 
Change Log Level to trace in IOUtils.java and ExceptionDiags.java)

> Remove useless log in IOUtils.java and ExceptionDiags.java
> --
>
> Key: HADOOP-16431
> URL: https://issues.apache.org/jira/browse/HADOOP-16431
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch
>
>
> When there is no String Constructor for the exception, we Log a Warn Message, 
> and rethrow the exception. We can change the Log level to TRACE/DEBUG.
> {code:java}
> private static  T wrapWithMessage(
>   T exception, String msg) {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor =
>   clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T) (t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type " +
>  clazz + ": it has no (String) constructor", e);
> return exception;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888450#comment-16888450
 ] 

Lisheng Sun commented on HADOOP-16431:
--

Thank [~elgoiri] for your good suggestions. I have update this patch.Could you 
help review it? And assign this issue to me. Thank you 

> Change Log Level to trace in IOUtils.java and ExceptionDiags.java
> -
>
> Key: HADOOP-16431
> URL: https://issues.apache.org/jira/browse/HADOOP-16431
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch
>
>
> When there is no String Constructor for the exception, we Log a Warn Message, 
> and rethrow the exception. We can change the Log level to TRACE/DEBUG.
> {code:java}
> private static  T wrapWithMessage(
>   T exception, String msg) {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor =
>   clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T) (t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type " +
>  clazz + ": it has no (String) constructor", e);
> return exception;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from 
OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-513063053
 
 
   /retest


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HADOOP-16431:
-
Attachment: HADOOP-16431.002.patch

> Change Log Level to trace in IOUtils.java and ExceptionDiags.java
> -
>
> Key: HADOOP-16431
> URL: https://issues.apache.org/jira/browse/HADOOP-16431
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16431.001.patch, HADOOP-16431.002.patch
>
>
> When there is no String Constructor for the exception, we Log a Warn Message, 
> and rethrow the exception. We can change the Log level to TRACE/DEBUG.
> {code:java}
> private static  T wrapWithMessage(
>   T exception, String msg) {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor =
>   clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T) (t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type " +
>  clazz + ": it has no (String) constructor", e);
> return exception;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888439#comment-16888439
 ] 

Íñigo Goiri commented on HADOOP-16431:
--

Given that log and throw is not a very good approach I guess the right thing 
would be to just not log it at all.

> Change Log Level to trace in IOUtils.java and ExceptionDiags.java
> -
>
> Key: HADOOP-16431
> URL: https://issues.apache.org/jira/browse/HADOOP-16431
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16431.001.patch
>
>
> When there is no String Constructor for the exception, we Log a Warn Message, 
> and rethrow the exception. We can change the Log level to TRACE/DEBUG.
> {code:java}
> private static  T wrapWithMessage(
>   T exception, String msg) {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor =
>   clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T) (t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type " +
>  clazz + ": it has no (String) constructor", e);
> return exception;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] mackrorysd commented on issue #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration

2019-07-18 Thread GitBox
mackrorysd commented on issue #1125: HADOOP-13868. [s3a] New default for S3A 
multi-part configuration
URL: https://github.com/apache/hadoop/pull/1125#issuecomment-513057068
 
 
   Common unit test failure is unrelated - not even changing common or anything 
it depends on. Not including other tests because this is a performance tuning. 
See JIRA for numbers from performance testing. Identical patch was +1'd on JIRA 
- will merge in 12 hours if I don't hear otherwise.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16431) Change Log Level to trace in IOUtils.java and ExceptionDiags.java

2019-07-18 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888429#comment-16888429
 ] 

Lisheng Sun commented on HADOOP-16431:
--

[~linyiqun]  [~jojochuang] [~hexiaoqiao] [~elgoiri] Could you help review this 
patch? Thank you.

> Change Log Level to trace in IOUtils.java and ExceptionDiags.java
> -
>
> Key: HADOOP-16431
> URL: https://issues.apache.org/jira/browse/HADOOP-16431
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16431.001.patch
>
>
> When there is no String Constructor for the exception, we Log a Warn Message, 
> and rethrow the exception. We can change the Log level to TRACE/DEBUG.
> {code:java}
> private static  T wrapWithMessage(
>   T exception, String msg) {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor =
>   clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T) (t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type " +
>  clazz + ": it has no (String) constructor", e);
> return exception;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-513052953
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 38 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 4 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 66 | Maven dependency ordering for branch |
   | +1 | mvninstall | 482 | trunk passed |
   | +1 | compile | 265 | trunk passed |
   | +1 | checkstyle | 77 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 865 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 187 | trunk passed |
   | 0 | spotbugs | 316 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 509 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 74 | Maven dependency ordering for patch |
   | +1 | mvninstall | 435 | the patch passed |
   | +1 | compile | 271 | the patch passed |
   | +1 | cc | 271 | the patch passed |
   | +1 | javac | 271 | the patch passed |
   | +1 | checkstyle | 73 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 1 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 648 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 159 | the patch passed |
   | +1 | findbugs | 536 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 277 | hadoop-hdds in the patch passed. |
   | -1 | unit | 1639 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 50 | The patch does not generate ASF License warnings. |
   | | | 6814 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.om.snapshot.TestOzoneManagerSnapshotProvider |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.container.server.TestSecureContainerServer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.8 Server=18.09.8 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/948 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle xml cc |
   | uname | Linux 0196a0d19d2c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d545f9c |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/testReport/ |
   | Max. process+thread count | 5388 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-ozone/common 
hadoop-ozone/integration-test hadoop-ozone/ozone-manager U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/5/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] commanderchewbacca closed pull request #1127: Gcs connector

2019-07-18 Thread GitBox
commanderchewbacca closed pull request #1127: Gcs connector
URL: https://github.com/apache/hadoop/pull/1127
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] commanderchewbacca opened a new pull request #1127: Gcs connector

2019-07-18 Thread GitBox
commanderchewbacca opened a new pull request #1127: Gcs connector
URL: https://github.com/apache/hadoop/pull/1127
 
 
   added gcs connector for  operator-metering


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] commanderchewbacca closed pull request #1126: Gcs connector

2019-07-18 Thread GitBox
commanderchewbacca closed pull request #1126: Gcs connector
URL: https://github.com/apache/hadoop/pull/1126
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] commanderchewbacca opened a new pull request #1126: Gcs connector

2019-07-18 Thread GitBox
commanderchewbacca opened a new pull request #1126: Gcs connector
URL: https://github.com/apache/hadoop/pull/1126
 
 
   added gcs connector to images to allow for gcs use for operator-metering


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888419#comment-16888419
 ] 

Josh Rosen commented on HADOOP-16437:
-

A fallback configuration is an interesting idea. I guess the addition of a new 
configuration alias for the typo would, itself, be a behavior change because 
what was previously a no-op would start having an actual effect (so maybe we'd 
want to {{releasenotes}} that?).

> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, fs/s3
>Affects Versions: 3.2.0, 3.3.0, 3.1.2
>Reporter: Josh Rosen
>Priority: Major
> Fix For: 3.3.0
>
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hadoop-yetus commented on issue #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1125: HADOOP-13868. [s3a] New default for S3A 
multi-part configuration
URL: https://github.com/apache/hadoop/pull/1125#issuecomment-513048431
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 65 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 23 | Maven dependency ordering for branch |
   | +1 | mvninstall | 1162 | trunk passed |
   | +1 | compile | 1033 | trunk passed |
   | +1 | checkstyle | 154 | trunk passed |
   | +1 | mvnsite | 128 | trunk passed |
   | +1 | shadedclient | 1091 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 105 | trunk passed |
   | 0 | spotbugs | 81 | Used deprecated FindBugs config; considering switching 
to SpotBugs. |
   | +1 | findbugs | 223 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for patch |
   | +1 | mvninstall | 97 | the patch passed |
   | +1 | compile | 1059 | the patch passed |
   | +1 | javac | 1059 | the patch passed |
   | +1 | checkstyle | 191 | the patch passed |
   | +1 | mvnsite | 132 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 2 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 737 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 96 | the patch passed |
   | +1 | findbugs | 214 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 546 | hadoop-common in the patch failed. |
   | +1 | unit | 310 | hadoop-aws in the patch passed. |
   | +1 | asflicense | 54 | The patch does not generate ASF License warnings. |
   | | | 7454 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1125 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient xml findbugs checkstyle |
   | uname | Linux eb09206b2aa1 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d545f9c |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/testReport/ |
   | Max. process+thread count | 1345 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1125/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-513043559
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 98 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 4 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 24 | Maven dependency ordering for branch |
   | +1 | mvninstall | 527 | trunk passed |
   | +1 | compile | 269 | trunk passed |
   | +1 | checkstyle | 76 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 959 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 182 | trunk passed |
   | 0 | spotbugs | 375 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 606 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 33 | Maven dependency ordering for patch |
   | +1 | mvninstall | 442 | the patch passed |
   | +1 | compile | 279 | the patch passed |
   | +1 | cc | 279 | the patch passed |
   | +1 | javac | 279 | the patch passed |
   | +1 | checkstyle | 85 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 1 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 764 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 169 | the patch passed |
   | +1 | findbugs | 537 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 336 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2175 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 55 | The patch does not generate ASF License warnings. |
   | | | 7789 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.om.snapshot.TestOzoneManagerSnapshotProvider |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.container.server.TestSecureContainerServer |
   |   | hadoop.ozone.client.rpc.Test2WayCommitInRatis |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/948 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle xml cc |
   | uname | Linux f2938c9c233e 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d545f9c |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/testReport/ |
   | Max. process+thread count | 4968 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-ozone/common 
hadoop-ozone/integration-test hadoop-ozone/ozone-manager U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/4/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional 

[GitHub] [hadoop] hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-513032608
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 70 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 4 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 22 | Maven dependency ordering for branch |
   | +1 | mvninstall | 487 | trunk passed |
   | +1 | compile | 305 | trunk passed |
   | +1 | checkstyle | 74 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 965 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 164 | trunk passed |
   | 0 | spotbugs | 349 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 553 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 32 | Maven dependency ordering for patch |
   | -1 | mvninstall | 160 | hadoop-ozone in the patch failed. |
   | -1 | compile | 62 | hadoop-ozone in the patch failed. |
   | -1 | cc | 62 | hadoop-ozone in the patch failed. |
   | -1 | javac | 62 | hadoop-ozone in the patch failed. |
   | +1 | checkstyle | 84 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 2 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 756 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 180 | the patch passed |
   | -1 | findbugs | 108 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | +1 | unit | 339 | hadoop-hdds in the patch passed. |
   | -1 | unit | 118 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 38 | The patch does not generate ASF License warnings. |
   | | | 5149 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/948 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle xml cc |
   | uname | Linux 5fd557cb2f6c 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d5ef38b |
   | Default Java | 1.8.0_212 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | cc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | javac | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-findbugs-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/testReport/ |
   | Max. process+thread count | 411 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-ozone/common 
hadoop-ozone/integration-test hadoop-ozone/ozone-manager U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-948/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888395#comment-16888395
 ] 

Sean Mackrory commented on HADOOP-13868:


Bit-rot after only 2 1/2 years? Imagine that! Actually the only part that 
doesn't apply cleanly is the documentation, and that's just because it's 
looking 100 lines away from where it should. Resubmitted as a pull request to 
verify a clean Yetus run, but as the patch is virtually identical I'll assume 
your +1 still applies unless I hear otherwise.

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] mackrorysd opened a new pull request #1125: HADOOP-13868. [s3a] New default for S3A multi-part configuration

2019-07-18 Thread GitBox
mackrorysd opened a new pull request #1125: HADOOP-13868. [s3a] New default for 
S3A multi-part configuration
URL: https://github.com/apache/hadoop/pull/1125
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888392#comment-16888392
 ] 

Hadoop QA commented on HADOOP-13868:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HADOOP-13868 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13868 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842566/HADOOP-13868.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/16388/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hadoop-yetus commented on issue #1108: HDDS-1805. Implement S3 Initiate MPU request to use Cache and DoubleBuffer.

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1108: HDDS-1805. Implement S3 Initiate MPU 
request to use Cache and DoubleBuffer.
URL: https://github.com/apache/hadoop/pull/1108#issuecomment-513019346
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 39 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 11 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 16 | Maven dependency ordering for branch |
   | +1 | mvninstall | 483 | trunk passed |
   | +1 | compile | 269 | trunk passed |
   | +1 | checkstyle | 79 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 863 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 166 | trunk passed |
   | 0 | spotbugs | 321 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 503 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for patch |
   | +1 | mvninstall | 451 | the patch passed |
   | +1 | compile | 269 | the patch passed |
   | +1 | cc | 269 | the patch passed |
   | +1 | javac | 269 | the patch passed |
   | +1 | checkstyle | 87 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 692 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 166 | the patch passed |
   | +1 | findbugs | 527 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 294 | hadoop-hdds in the patch passed. |
   | -1 | unit | 1616 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 56 | The patch does not generate ASF License warnings. |
   | | | 6791 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.container.server.TestSecureContainerServer |
   |   | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.8 Server=18.09.8 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1108 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle cc |
   | uname | Linux cb07dd5ad9f5 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / d5ef38b |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/testReport/ |
   | Max. process+thread count | 5157 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/common hadoop-ozone/ozone-manager U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1108/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888382#comment-16888382
 ] 

Steve Loughran commented on HADOOP-13868:
-

LGTM +1



> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888377#comment-16888377
 ] 

Hudson commented on HADOOP-16437:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16955 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16955/])
HADOOP-16437 documentation typo fix: fs.s3a.experimental.input.fadvise (stevel: 
rev d545f9c2903fe63f44c1330d9ce55a85de93804f)
* (edit) 
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstreambuilder.md
* (edit) 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
* (edit) 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md


> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, fs/s3
>Affects Versions: 3.2.0, 3.3.0, 3.1.2
>Reporter: Josh Rosen
>Priority: Major
> Fix For: 3.3.0
>
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran merged pull request #1117: HADOOP-16437 documentation typo fix: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread GitBox
steveloughran merged pull request #1117: HADOOP-16437 documentation typo fix: 
fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
URL: https://github.com/apache/hadoop/pull/1117
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888374#comment-16888374
 ] 

Steve Loughran commented on HADOOP-16437:
-

thanks. Applied. We're probably going to have to backport this a long way 
aren't we. And probably everywhere else I've written.

There's another thing we could do here, given the docs are out and about: we 
actually add the other entry as a deprecated key. That way people who ask for 
it, get it + a warning. Thoughts?

(oh, and now we have a better openFile() command, I do actually want to make 
seek policy a standard option we could implement in all the stores 
consistently, so that ORC/Parquet code can know to ask for it)

> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, fs/s3
>Affects Versions: 3.2.0, 3.3.0, 3.1.2
>Reporter: Josh Rosen
>Priority: Major
> Fix For: 3.3.0
>
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305137102
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");

[jira] [Updated] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16437:

Fix Version/s: 3.3.0

> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, fs/s3
>Affects Versions: 3.2.0, 3.3.0, 3.1.2
>Reporter: Josh Rosen
>Priority: Major
> Fix For: 3.3.0
>
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16437:

Affects Version/s: 3.3.0
   3.2.0
   3.1.2

> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, fs/s3
>Affects Versions: 3.2.0, 3.3.0, 3.1.2
>Reporter: Josh Rosen
>Priority: Major
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16437) Documentation typos: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16437:

Component/s: fs/s3

> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, fs/s3
>Reporter: Josh Rosen
>Priority: Major
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on issue #1117: HADOOP-16437 documentation typo fix: fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise

2019-07-18 Thread GitBox
steveloughran commented on issue #1117: HADOOP-16437 documentation typo fix: 
fs.s3a.experimental.fadvise -> fs.s3a.experimental.input.fadvise
URL: https://github.com/apache/hadoop/pull/1117#issuecomment-513008816
 
 
   +1 ,committed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305134924
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");

[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305135149
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305134884
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");

[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305134992
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer 
ratisServer) {
 ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true)
 .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build();
 this.executorService = HadoopExecutors.newSingleThreadExecutor(build);
+this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor();
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305113822
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java
 ##
 @@ -0,0 +1,193 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations 
under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om;
+
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.VolumeArgs;
+import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs;
+import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer;
+import org.apache.hadoop.utils.db.DBCheckpoint;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey;
+
+/**
+ * Tests the Ratis snaphsots feature in OM.
+ */
+public class TestOMRatisSnapshots {
+
+  private MiniOzoneHAClusterImpl cluster = null;
+  private ObjectStore objectStore;
+  private OzoneConfiguration conf;
+  private String clusterId;
+  private String scmId;
+  private int numOfOMs = 3;
+  private static final long SNAPSHOT_THRESHOLD = 50;
+  private static final int LOG_PURGE_GAP = 50;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  @Rule
+  public Timeout timeout = new Timeout(3000_000);
+
+  /**
+   * Create a MiniDFSCluster for testing. The cluster initially has one
+   * inactive OM. So at the start of the cluster, there will be 2 active and 1
+   * inactive OM.
+   *
+   * @throws IOException
+   */
+  @Before
+  public void init() throws Exception {
+conf = new OzoneConfiguration();
+clusterId = UUID.randomUUID().toString();
+scmId = UUID.randomUUID().toString();
+conf.setLong(
+OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
+SNAPSHOT_THRESHOLD);
+conf.setInt(OMConfigKeys.OZONE_OM_RATIS_LOG_PURGE_GAP, LOG_PURGE_GAP);
+cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf)
+.setClusterId(clusterId)
+.setScmId(scmId)
+.setOMServiceId("om-service-test1")
+.setNumOfOzoneManagers(numOfOMs)
+.setNumOfActiveOMs(2)
+.build();
+cluster.waitForClusterToBeReady();
+objectStore = OzoneClientFactory.getRpcClient(conf).getObjectStore();
+  }
+
+  /**
+   * Shutdown MiniDFSCluster.
+   */
+  @After
+  public void shutdown() {
+if (cluster != null) {
+  cluster.shutdown();
+}
+  }
+
+  @Test
+  public void testInstallSnapshot() throws Exception {
+// Get the leader OM
+String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider()
+.getCurrentProxyOMNodeId();
+OzoneManager leaderOM = cluster.getOzoneManager(leaderOMNodeId);
+OzoneManagerRatisServer leaderRatisServer = leaderOM.getOmRatisServer();
+
+// Find the inactive OM
+String followerNodeId = leaderOM.getPeerNodes().get(0).getOMNodeId();
+if (cluster.isOMActive(followerNodeId)) {
+  followerNodeId = leaderOM.getPeerNodes().get(1).getOMNodeId();
+}
+OzoneManager followerOM = cluster.getOzoneManager(followerNodeId);
+
+// Do some transactions so that the log index increases
+String userName = "user" + RandomStringUtils.randomNumeric(5);
+String adminName = "admin" + RandomStringUtils.randomNumeric(5);
+String volumeName = "volume" + RandomStringUtils.randomNumeric(5);
+

[GitHub] [hadoop] hadoop-yetus commented on issue #1067: HDDS-1653. Add option to "ozone scmcli printTopology" to order the ou…

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1067: HDDS-1653. Add option to "ozone scmcli 
printTopology" to order the ou…
URL: https://github.com/apache/hadoop/pull/1067#issuecomment-512994540
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 76 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | 0 | shelldocs | 1 | Shelldocs was not available. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 71 | Maven dependency ordering for branch |
   | +1 | mvninstall | 518 | trunk passed |
   | +1 | compile | 315 | trunk passed |
   | +1 | checkstyle | 88 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 871 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 183 | trunk passed |
   | 0 | spotbugs | 342 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 558 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 41 | Maven dependency ordering for patch |
   | +1 | mvninstall | 497 | the patch passed |
   | +1 | compile | 321 | the patch passed |
   | +1 | javac | 321 | the patch passed |
   | +1 | checkstyle | 93 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | shellcheck | 1 | There were no new shellcheck issues. |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 814 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 218 | the patch passed |
   | +1 | findbugs | 653 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 125 | hadoop-hdds in the patch failed. |
   | -1 | unit | 2095 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 51 | The patch does not generate ASF License warnings. |
   | | | 7870 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.lock.TestLockManager |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.container.server.TestSecureContainerServer |
   |   | hadoop.ozone.client.rpc.TestWatchForCommit |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1067 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle shellcheck shelldocs |
   | uname | Linux 7fc322282b06 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9838a47 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/testReport/ |
   | Max. process+thread count | 4862 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/tools hadoop-ozone/dist U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1067/2/console |
   | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hadoop] hadoop-yetus commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket request to use Cache and DoubleBuffer.

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket 
request to use Cache and DoubleBuffer.
URL: https://github.com/apache/hadoop/pull/1097#issuecomment-512990741
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 40 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 5 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 14 | Maven dependency ordering for branch |
   | +1 | mvninstall | 472 | trunk passed |
   | +1 | compile | 258 | trunk passed |
   | +1 | checkstyle | 74 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 847 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 147 | trunk passed |
   | 0 | spotbugs | 312 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 504 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 16 | Maven dependency ordering for patch |
   | +1 | mvninstall | 410 | the patch passed |
   | +1 | compile | 244 | the patch passed |
   | +1 | javac | 244 | the patch passed |
   | +1 | checkstyle | 62 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 616 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 148 | the patch passed |
   | +1 | findbugs | 514 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 286 | hadoop-hdds in the patch failed. |
   | -1 | unit | 1921 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 51 | The patch does not generate ASF License warnings. |
   | | | 6769 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdds.scm.container.placement.algorithms.TestContainerPlacementFactory |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.container.server.TestSecureContainerServer |
   |   | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.8 Server=18.09.8 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1097 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 4013f1963c43 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9838a47 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/testReport/ |
   | Max. process+thread count | 5285 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/common hadoop-ozone/ozone-manager U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1097/2/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.

2019-07-18 Thread GitBox
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in 
OM to serve delta updates through an API.
URL: https://github.com/apache/hadoop/pull/1033#discussion_r305115724
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/RDBStore.java
 ##
 @@ -318,6 +320,44 @@ public CodecRegistry getCodecRegistry() {
 return codecRegistry;
   }
 
+  @Override
+  public DBUpdatesWrapper getUpdatesSince(long sequenceNumber)
+  throws DataNotFoundException {
+
+DBUpdatesWrapper dbUpdatesWrapper = new DBUpdatesWrapper();
+try {
+  TransactionLogIterator transactionLogIterator =
+  db.getUpdatesSince(sequenceNumber);
+
+  boolean flag = true;
+
+  while (transactionLogIterator.isValid()) {
 
 Review comment:
   Does this imply flush to sst can happen while iterating the log?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305113822
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java
 ##
 @@ -0,0 +1,193 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations 
under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om;
+
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.VolumeArgs;
+import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs;
+import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer;
+import org.apache.hadoop.utils.db.DBCheckpoint;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey;
+
+/**
+ * Tests the Ratis snaphsots feature in OM.
+ */
+public class TestOMRatisSnapshots {
+
+  private MiniOzoneHAClusterImpl cluster = null;
+  private ObjectStore objectStore;
+  private OzoneConfiguration conf;
+  private String clusterId;
+  private String scmId;
+  private int numOfOMs = 3;
+  private static final long SNAPSHOT_THRESHOLD = 50;
+  private static final int LOG_PURGE_GAP = 50;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  @Rule
+  public Timeout timeout = new Timeout(3000_000);
+
+  /**
+   * Create a MiniDFSCluster for testing. The cluster initially has one
+   * inactive OM. So at the start of the cluster, there will be 2 active and 1
+   * inactive OM.
+   *
+   * @throws IOException
+   */
+  @Before
+  public void init() throws Exception {
+conf = new OzoneConfiguration();
+clusterId = UUID.randomUUID().toString();
+scmId = UUID.randomUUID().toString();
+conf.setLong(
+OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
+SNAPSHOT_THRESHOLD);
+conf.setInt(OMConfigKeys.OZONE_OM_RATIS_LOG_PURGE_GAP, LOG_PURGE_GAP);
+cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf)
+.setClusterId(clusterId)
+.setScmId(scmId)
+.setOMServiceId("om-service-test1")
+.setNumOfOzoneManagers(numOfOMs)
+.setNumOfActiveOMs(2)
+.build();
+cluster.waitForClusterToBeReady();
+objectStore = OzoneClientFactory.getRpcClient(conf).getObjectStore();
+  }
+
+  /**
+   * Shutdown MiniDFSCluster.
+   */
+  @After
+  public void shutdown() {
+if (cluster != null) {
+  cluster.shutdown();
+}
+  }
+
+  @Test
+  public void testInstallSnapshot() throws Exception {
+// Get the leader OM
+String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider()
+.getCurrentProxyOMNodeId();
+OzoneManager leaderOM = cluster.getOzoneManager(leaderOMNodeId);
+OzoneManagerRatisServer leaderRatisServer = leaderOM.getOmRatisServer();
+
+// Find the inactive OM
+String followerNodeId = leaderOM.getPeerNodes().get(0).getOMNodeId();
+if (cluster.isOMActive(followerNodeId)) {
+  followerNodeId = leaderOM.getPeerNodes().get(1).getOMNodeId();
+}
+OzoneManager followerOM = cluster.getOzoneManager(followerNodeId);
+
+// Do some transactions so that the log index increases
+String userName = "user" + RandomStringUtils.randomNumeric(5);
+String adminName = "admin" + RandomStringUtils.randomNumeric(5);
+String volumeName = "volume" + RandomStringUtils.randomNumeric(5);
+

[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.

2019-07-18 Thread GitBox
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in 
OM to serve delta updates through an API.
URL: https://github.com/apache/hadoop/pull/1033#discussion_r305113051
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DataNotFoundException.java
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.utils.db;
+
+import java.io.IOException;
+
+/**
+ * Thrown if RocksDB is unable to find requested data from WAL file.
+ */
+public class DataNotFoundException extends IOException {
 
 Review comment:
   or just SequenceNumberNotFoundException?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305113075
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");

[GitHub] [hadoop] hadoop-yetus commented on issue #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1120: HDDS-1822. NPE in 
SCMCommonPolicy.chooseDatanodes
URL: https://github.com/apache/hadoop/pull/1120#issuecomment-512985848
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 72 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 504 | trunk passed |
   | +1 | compile | 262 | trunk passed |
   | +1 | checkstyle | 72 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 956 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 176 | trunk passed |
   | 0 | spotbugs | 364 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 601 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 449 | the patch passed |
   | +1 | compile | 275 | the patch passed |
   | +1 | javac | 275 | the patch passed |
   | +1 | checkstyle | 79 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 754 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 177 | the patch passed |
   | +1 | findbugs | 620 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 346 | hadoop-hdds in the patch failed. |
   | -1 | unit | 2070 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 56 | The patch does not generate ASF License warnings. |
   | | | 7621 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdds.scm.block.TestBlockManager |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.TestStorageContainerManager |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.container.server.TestSecureContainerServer |
   |   | hadoop.hdds.scm.pipeline.TestSCMPipelineManager |
   |   | hadoop.ozone.client.rpc.TestWatchForCommit |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1120 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 61e5691f22ee 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9838a47 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/testReport/ |
   | Max. process+thread count | 4659 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1120/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.

2019-07-18 Thread GitBox
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in 
OM to serve delta updates through an API.
URL: https://github.com/apache/hadoop/pull/1033#discussion_r305112491
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DataNotFoundException.java
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.utils.db;
+
+import java.io.IOException;
+
+/**
+ * Thrown if RocksDB is unable to find requested data from WAL file.
+ */
+public class DataNotFoundException extends IOException {
 
 Review comment:
   The exception seems too broad, can be DataNotFoundFroSequenceNumberException?
   The message can, therefore, ask for the sequence number and client-side code 
can log it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in OM to serve delta updates through an API.

2019-07-18 Thread GitBox
swagle commented on a change in pull request #1033: HDDS-1391 : Add ability in 
OM to serve delta updates through an API.
URL: https://github.com/apache/hadoop/pull/1033#discussion_r305112491
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/DataNotFoundException.java
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.utils.db;
+
+import java.io.IOException;
+
+/**
+ * Thrown if RocksDB is unable to find requested data from WAL file.
+ */
+public class DataNotFoundException extends IOException {
 
 Review comment:
   The exception seems too broad, can be DataNotFoundForSequenceNumberException?
   The message can, therefore, ask for the sequence number and client-side code 
can log it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305111915
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");

[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305108487
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java
 ##
 @@ -0,0 +1,193 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations 
under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om;
+
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.VolumeArgs;
+import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs;
+import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer;
+import org.apache.hadoop.utils.db.DBCheckpoint;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey;
+
+/**
+ * Tests the Ratis snaphsots feature in OM.
+ */
+public class TestOMRatisSnapshots {
+
+  private MiniOzoneHAClusterImpl cluster = null;
+  private ObjectStore objectStore;
+  private OzoneConfiguration conf;
+  private String clusterId;
+  private String scmId;
+  private int numOfOMs = 3;
+  private static final long SNAPSHOT_THRESHOLD = 50;
+  private static final int LOG_PURGE_GAP = 50;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  @Rule
+  public Timeout timeout = new Timeout(3000_000);
+
+  /**
+   * Create a MiniDFSCluster for testing. The cluster initially has one
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on issue #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-512981397
 
 
   > **Question:**
   > In ShouldInstallSnapshot, it calls getLatestSnapshot() from 
stateMachineStorage, as we have our own snapshot implementation in 
stateMachine, do we need to override that method to provide correct 
snapshotInfo? Or could you provide some info how this works?
   
   We do not want the snapshots to be handled via Ratis. When a follower 
receives installSnaphsot notification, it sends the new loaded DB's snapshot 
index back to the leader. The leader updates the followers snaphsot index 
through this.
   
   But when Ratis server is starting up, it should be able to determine the 
latest snapshot index. Otherwise, all the logs will be replayed from the start. 
I will create a new Jira to address this. Thanks Bharat.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305104705
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer 
ratisServer) {
 ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true)
 .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build();
 this.executorService = HadoopExecutors.newSingleThreadExecutor(build);
+this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor();
   }
 
   /**
* Initializes the State Machine with the given server, group and storage.
* TODO: Load the latest snapshot from the file system.
 
 Review comment:
   Correction: On reloading state, we should not read the saved snaphsot index. 
Instead, we should updated the snapshot index on disk.
   During normal startup, we already read the saved snapshot index.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] xiaoyuyao merged pull request #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes

2019-07-18 Thread GitBox
xiaoyuyao merged pull request #1120: HDDS-1822. NPE in 
SCMCommonPolicy.chooseDatanodes
URL: https://github.com/apache/hadoop/pull/1120
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] xiaoyuyao commented on issue #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes

2019-07-18 Thread GitBox
xiaoyuyao commented on issue #1120: HDDS-1822. NPE in 
SCMCommonPolicy.chooseDatanodes
URL: https://github.com/apache/hadoop/pull/1120#issuecomment-512974489
 
 
   +1. Thanks @adoroszlai  for fixing this. I will merge this shortly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16311) Hadoop build failure - natively on ARM (armv7) - oom_listener_main.c issues

2019-07-18 Thread AiBe Gee (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888027#comment-16888027
 ] 

AiBe Gee edited comment on HADOOP-16311 at 7/18/19 8:24 PM:


[~tonyharvey]

Sorry for the late reply, haven't checked the E-Mail I used for my Jira 
registration until today.

I put this on resolved because I learned that it was a duplicate:
 https://issues.apache.org/jira/browse/YARN-8498
 and not because I managed to resolve it.

Unfortunately, the patches presented in:
 https://issues.apache.org/jira/browse/YARN-8498
 are not working and I went for Hadoop 2.9.2, which I was able to build 
successfully on Pi2B.
 Same for HBase - I picked 1.4.9, Hive - 2.3.5 and Phoenix 4.14.2

The latest versions for the packages I presented above don't seem to work on 
ARM, the maven build scripts are downloading some X86 stuff and the builds are 
failing. See:
 https://issues.apache.org/jira/browse/HADOOP-16309

Off-Topic,  just to help you, these are my notes for the hadoop 2.9.2 build on 
Raspberry Pi2 ARMv7 using Slackware Linux 14.2:
 - protobuf 2.5.0 required !
 wget [https://github.com/apache/hadoop/archive/rel/release-2.9.2.tar.gz]
tar -xzpf release-2.9.2.tar.gz
cd hadoop-rel-release-2.9.2/
 - swap - using external HDD
swapoff /dev/whatever-partition-is-actually-the-swap
mkswap /dev/sda1
swapon /dev/sda1
 echo 1 > /proc/sys/vm/swappiness
 - Environment:
export PATH="$PATH:/opt/java/bin"
export M2_HOME=/opt/apache-maven-3.6.1
export "PATH=$PATH:$M2_HOME/bin"
JAVA_HOME=/opt/java
export JAVA_HOME
export ARCH=arm
export CFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 
-mvectorize-with-neon-quad -mfloat-abi=hard"
export CXXFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 
-mvectorize-with-neon-quad -mfloat-abi=hard"
export CPPFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 
-mvectorize-with-neon-quad -mfloat-abi=hard"
export MAKEFLAGS="-j 3"
 - need to patch pom.xml - add to pom.xml:  
   
   org.apache.maven.plugins   
  maven-surefire-plugin   
  3.0.0-M3  
      
    false  
   
   

 - needs this too - only on ARM:
 https://issues.apache.org/jira/browse/HADOOP-9320- patch - v2.8.patch
 cd /kit/hadoop-rel-release-2.9.2/hadoop-common-project/hadoop-common/
 wget https://patch-diff.githubusercontent.com/raw/apache/hadoop/pull/224.patch
 patch < 224.patch
 - this patch too:
 https://issues.apache.org/jira/browse/HADOOP-14597
 cd 
/kit/hadoop-rel-release-2.9.2/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/
 wget [^HADOOP-14597.04.patch]
 patch < HADOOP-14597.04.patch
 cd 
/kit/hadoop-rel-release-2.9.2/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/
 wget [^HADOOP-14597.04.patch]
 patch < HADOOP-14597.04.patch

Build:
 cd /kit/hadoop-rel-release-2.9.2/
 nohup mvn package -Pdist,native,docs -DskipTests -Dtar 2>&1 | tee 
hadoop-2-9-2-build.log
 cp /kit/hadoop-rel-release-2.9.2/hadoop-dist/target/hadoop-2.9.2.tar.gz /kit/

Hope it helps.

P.S. Edit - still profoundly horrified/disgusted over how jira works - 
autoformatting, worse than Redmond Word! I have edited my post several times 
and there are still some links to some patches broken, sorry, I lost patience 
correcting all the automated crap.
A pity using this impossible tool for such a great project like hadoop ...


was (Author: abga):
[~tonyharvey]

Sorry for the late reply, haven't checked the E-Mail I used for my Jira 
registration until today.

I put this on resolved because I learned that it was a duplicate:
 https://issues.apache.org/jira/browse/YARN-8498
 and not because I managed to resolve it.

Unfortunately, the patches presented in:
 https://issues.apache.org/jira/browse/YARN-8498
 are not working and I went for Hadoop 2.9.2, which I was able to build 
successfully on Pi2B.
 Same for HBase - I picked 1.4.9, Hive - 2.3.5 and Phoenix 4.14.2

The latest versions for the packages I presented above don't seem to work on 
ARM, the maven build scripts are downloading some X86 stuff and the builds are 
failing. See:
 https://issues.apache.org/jira/browse/HADOOP-16309

Off-Topic,  just to help you, these are my notes for the hadoop 2.9.2 build on 
Raspberry Pi2 ARMv7 using Slackware Linux 14.2:
 - protobuf 2.5.0 required !
 wget [https://github.com/apache/hadoop/archive/rel/release-2.9.2.tar.gz]
tar -xzpf release-2.9.2.tar.gz
cd hadoop-rel-release-2.9.2/
 - swap - using external HDD
swapoff /dev/whatever-partition-is-actually-the-swap
mkswap /dev/sda1
swapon /dev/sda1
 echo 1 > /proc/sys/vm/swappiness
 - Environment:
 export PATH="$PATH:/opt/java/bin"export M2_HOME=/opt/apache-maven-3.6.1
export "PATH=$PATH:$M2_HOME/bin"JAVA_HOME=/opt/java
export JAVA_HOME
export ARCH=arm
export CFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 
-mvectorize-with-neon-quad -mfloat-abi=hard"
export CXXFLAGS="-march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 

[GitHub] [hadoop] hadoop-yetus commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1124: HDDS-1749 : Ozone Client should 
randomize the list of nodes in pipeli…
URL: https://github.com/apache/hadoop/pull/1124#issuecomment-512967762
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 36 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 466 | trunk passed |
   | +1 | compile | 265 | trunk passed |
   | +1 | checkstyle | 73 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 868 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 146 | trunk passed |
   | 0 | spotbugs | 310 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 501 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 443 | the patch passed |
   | +1 | compile | 240 | the patch passed |
   | +1 | javac | 240 | the patch passed |
   | +1 | checkstyle | 67 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 625 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 152 | the patch passed |
   | +1 | findbugs | 513 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 281 | hadoop-hdds in the patch failed. |
   | -1 | unit | 1480 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 51 | The patch does not generate ASF License warnings. |
   | | | 6367 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdds.scm.container.placement.algorithms.TestContainerPlacementFactory |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.container.server.TestSecureContainerServer |
   |   | hadoop.ozone.container.ozoneimpl.TestSecureOzoneContainer |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestFailureHandlingByClient |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestCommitWatcher |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.8 Server=18.09.8 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1124 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 9df3745bafda 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9838a47 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/testReport/ |
   | Max. process+thread count | 5170 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/client U: hadoop-hdds/client |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1124/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket request to use Cache and DoubleBuffer.

2019-07-18 Thread GitBox
bharatviswa504 commented on issue #1097: HDDS-1795. Implement S3 Delete Bucket 
request to use Cache and DoubleBuffer.
URL: https://github.com/apache/hadoop/pull/1097#issuecomment-512952595
 
 
   This is ready for review. 
   Rebased with the latest trunk, as now HDDS-1689 got checked in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] adoroszlai commented on a change in pull request #1120: HDDS-1822. NPE in SCMCommonPolicy.chooseDatanodes

2019-07-18 Thread GitBox
adoroszlai commented on a change in pull request #1120: HDDS-1822. NPE in 
SCMCommonPolicy.chooseDatanodes
URL: https://github.com/apache/hadoop/pull/1120#discussion_r305071450
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestContainerPlacementFactory.java
 ##
 @@ -75,50 +69,25 @@ public void setup() {
   // Totally 3 racks, each has 5 datanodes
   DatanodeDetails node = TestUtils.createDatanodeDetails(
   hostname + i, rack + (i / 5));
-  datanodes.add(node);
   cluster.add(node);
 }
 
 // create mock node manager
-nodeManager = Mockito.mock(NodeManager.class);
-when(nodeManager.getNodes(NodeState.HEALTHY))
-.thenReturn(new ArrayList<>(datanodes));
-when(nodeManager.getNodeStat(anyObject()))
-.thenReturn(new SCMNodeMetric(storageCapacity, 0L, 100L));
-when(nodeManager.getNodeStat(datanodes.get(2)))
-.thenReturn(new SCMNodeMetric(storageCapacity, 90L, 10L));
-when(nodeManager.getNodeStat(datanodes.get(3)))
-.thenReturn(new SCMNodeMetric(storageCapacity, 80L, 20L));
-when(nodeManager.getNodeStat(datanodes.get(4)))
-.thenReturn(new SCMNodeMetric(storageCapacity, 70L, 30L));
-  }
-
+NodeManager nodeManager = Mockito.mock(NodeManager.class);
 
-  @Test
-  public void testDefaultPolicy() throws IOException {
 ContainerPlacementPolicy policy = ContainerPlacementPolicyFactory
 .getPolicy(conf, nodeManager, cluster, true);
-
 
 Review comment:
   Thanks @xiaoyuyao for the suggestion, it's implemented in the latest commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hadoop-yetus commented on issue #1123: HADOOP-16380 tombstones

2019-07-18 Thread GitBox
hadoop-yetus commented on issue #1123: HADOOP-16380 tombstones
URL: https://github.com/apache/hadoop/pull/1123#issuecomment-512944956
 
 
   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 82 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 4 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 89 | Maven dependency ordering for branch |
   | +1 | mvninstall | 1257 | trunk passed |
   | +1 | compile | 1222 | trunk passed |
   | +1 | checkstyle | 143 | trunk passed |
   | +1 | mvnsite | 127 | trunk passed |
   | +1 | shadedclient | 983 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 97 | trunk passed |
   | 0 | spotbugs | 65 | Used deprecated FindBugs config; considering switching 
to SpotBugs. |
   | +1 | findbugs | 185 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 23 | Maven dependency ordering for patch |
   | +1 | mvninstall | 78 | the patch passed |
   | +1 | compile | 1266 | the patch passed |
   | +1 | javac | 1266 | the patch passed |
   | +1 | checkstyle | 140 | the patch passed |
   | +1 | mvnsite | 122 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 717 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 99 | the patch passed |
   | +1 | findbugs | 199 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 528 | hadoop-common in the patch passed. |
   | +1 | unit | 289 | hadoop-aws in the patch passed. |
   | +1 | asflicense | 47 | The patch does not generate ASF License warnings. |
   | | | 7684 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.8 Server=18.09.8 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1123/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1123 |
   | Optional Tests | dupname asflicense mvnsite compile javac javadoc 
mvninstall unit shadedclient findbugs checkstyle |
   | uname | Linux 6b218744fdc8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9838a47 |
   | Default Java | 1.8.0_212 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1123/1/testReport/ |
   | Max. process+thread count | 1387 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1123/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] adoroszlai removed a comment on issue #1122: YARN-9679. Regular code cleanup in TestResourcePluginManager

2019-07-18 Thread GitBox
adoroszlai removed a comment on issue #1122: YARN-9679. Regular code cleanup in 
TestResourcePluginManager
URL: https://github.com/apache/hadoop/pull/1122#issuecomment-512936966
 
 
   rebuild


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] adoroszlai commented on issue #1122: YARN-9679. Regular code cleanup in TestResourcePluginManager

2019-07-18 Thread GitBox
adoroszlai commented on issue #1122: YARN-9679. Regular code cleanup in 
TestResourcePluginManager
URL: https://github.com/apache/hadoop/pull/1122#issuecomment-512936966
 
 
   rebuild


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305056316
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java
 ##
 @@ -0,0 +1,193 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations 
under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om;
+
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.VolumeArgs;
+import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs;
+import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer;
+import org.apache.hadoop.utils.db.DBCheckpoint;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey;
+
+/**
+ * Tests the Ratis snaphsots feature in OM.
+ */
+public class TestOMRatisSnapshots {
+
+  private MiniOzoneHAClusterImpl cluster = null;
+  private ObjectStore objectStore;
+  private OzoneConfiguration conf;
+  private String clusterId;
+  private String scmId;
+  private int numOfOMs = 3;
+  private static final long SNAPSHOT_THRESHOLD = 50;
+  private static final int LOG_PURGE_GAP = 50;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  @Rule
+  public Timeout timeout = new Timeout(3000_000);
+
+  /**
+   * Create a MiniDFSCluster for testing. The cluster initially has one
+   * inactive OM. So at the start of the cluster, there will be 2 active and 1
+   * inactive OM.
+   *
+   * @throws IOException
+   */
+  @Before
+  public void init() throws Exception {
+conf = new OzoneConfiguration();
+clusterId = UUID.randomUUID().toString();
+scmId = UUID.randomUUID().toString();
+conf.setLong(
+OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
+SNAPSHOT_THRESHOLD);
+conf.setInt(OMConfigKeys.OZONE_OM_RATIS_LOG_PURGE_GAP, LOG_PURGE_GAP);
+cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf)
+.setClusterId(clusterId)
+.setScmId(scmId)
+.setOMServiceId("om-service-test1")
+.setNumOfOzoneManagers(numOfOMs)
+.setNumOfActiveOMs(2)
+.build();
+cluster.waitForClusterToBeReady();
+objectStore = OzoneClientFactory.getRpcClient(conf).getObjectStore();
+  }
+
+  /**
+   * Shutdown MiniDFSCluster.
+   */
+  @After
+  public void shutdown() {
+if (cluster != null) {
+  cluster.shutdown();
+}
+  }
+
+  @Test
+  public void testInstallSnapshot() throws Exception {
+// Get the leader OM
+String leaderOMNodeId = objectStore.getClientProxy().getOMProxyProvider()
+.getCurrentProxyOMNodeId();
+OzoneManager leaderOM = cluster.getOzoneManager(leaderOMNodeId);
+OzoneManagerRatisServer leaderRatisServer = leaderOM.getOmRatisServer();
+
+// Find the inactive OM
+String followerNodeId = leaderOM.getPeerNodes().get(0).getOMNodeId();
+if (cluster.isOMActive(followerNodeId)) {
+  followerNodeId = leaderOM.getPeerNodes().get(1).getOMNodeId();
+}
+OzoneManager followerOM = cluster.getOzoneManager(followerNodeId);
+
+// Do some transactions so that the log index increases
+String userName = "user" + RandomStringUtils.randomNumeric(5);
+String adminName = "admin" + RandomStringUtils.randomNumeric(5);
+String volumeName = "volume" + RandomStringUtils.randomNumeric(5);
+

[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305053446
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305053446
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] avijayanhwx commented on issue #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…

2019-07-18 Thread GitBox
avijayanhwx commented on issue #1124: HDDS-1749 : Ozone Client should randomize 
the list of nodes in pipeli…
URL: https://github.com/apache/hadoop/pull/1124#issuecomment-512929603
 
 
   /label ozone


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305052177
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] avijayanhwx opened a new pull request #1124: HDDS-1749 : Ozone Client should randomize the list of nodes in pipeli…

2019-07-18 Thread GitBox
avijayanhwx opened a new pull request #1124: HDDS-1749 : Ozone Client should 
randomize the list of nodes in pipeli…
URL: https://github.com/apache/hadoop/pull/1124
 
 
   …ne for reads.
   
   Currently the list of nodes returned by SCM are static and are returned in 
the same order to all the clients. Ideally these should be sorted by the 
network topology and then returned to client.
   
   However even when network topology in not available, then SCM/client should 
randomly sort the nodes before choosing the replica's to connect.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] goiri commented on issue #1040: HDFS-13693. Remove unnecessary search in INodeDirectory.addChild during image loa…

2019-07-18 Thread GitBox
goiri commented on issue #1040: HDFS-13693. Remove unnecessary search in 
INodeDirectory.addChild during image loa…
URL: https://github.com/apache/hadoop/pull/1040#issuecomment-512928429
 
 
   The parallel life of JIRAs and PRs is driving me a little crazy.
   We have both the patch and the diff here and then we also have comments in 
both.
   
   Anyway, He Xiaoqiao seems to have comments in the JIRA.
   It would be good to get his +1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305048987
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305047773
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305047773
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305046862
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot 

[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305044258
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
 ##
 @@ -159,15 +159,18 @@ public void decNumKeys() {
   }
 
   public void setNumVolumes(long val) {
-this.numVolumes.incr(val);
+long oldVal = this.numVolumes.value();
+this.numVolumes.incr(val - oldVal);
 
 Review comment:
   Got it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305041874
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
 
 Review comment:
   If checkpointSnapshotIndex <= lastAppliedIndex, I think here we need to 
clean up the DB checkpoint which is downloaded


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305041731
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
 ##
 @@ -159,15 +159,18 @@ public void decNumKeys() {
   }
 
   public void setNumVolumes(long val) {
-this.numVolumes.incr(val);
+long oldVal = this.numVolumes.value();
+this.numVolumes.incr(val - oldVal);
 
 Review comment:
   Lets say numVolumes = 10. After that the OM is restarted or state is 
reloaded with a new DB checkpoint. Not the num of volumes in the VolumeTable is 
20. If we increment the numVolumes metric by 20, then the metrics will show 
total num of volumes to be 30 whereas it should be only 20.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305039831
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   Agree. Will update it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305039875
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
 ##
 @@ -159,15 +159,18 @@ public void decNumKeys() {
   }
 
   public void setNumVolumes(long val) {
-this.numVolumes.incr(val);
+long oldVal = this.numVolumes.value();
+this.numVolumes.incr(val - oldVal);
 
 Review comment:
   Then why do we do this this.numVolumes.incr(val - oldVal);
   Sorry still not got it why are we doing this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305039471
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   Agree. Will update it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305038252
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer 
ratisServer) {
 ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true)
 .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build();
 this.executorService = HadoopExecutors.newSingleThreadExecutor(build);
+this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor();
 
 Review comment:
   Shutdown of this executor needs to be done in StateMachine stop.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305035508
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -190,6 +197,27 @@ public TransactionContext startTransaction(
 }
   }
 
+  @Override
+  public void pause() {
+lifeCycle.transition(LifeCycle.State.PAUSING);
 
 Review comment:
   It is taken care of internally by Ratis. The StateMachineUpdater in Ratis 
checks if the state is RUNNING before applying log entries to StateMachine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305032543
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
 ##
 @@ -159,15 +159,18 @@ public void decNumKeys() {
   }
 
   public void setNumVolumes(long val) {
-this.numVolumes.incr(val);
+long oldVal = this.numVolumes.value();
+this.numVolumes.incr(val - oldVal);
 
 Review comment:
   The setNumVolumes is called with the total number of rows in VolumeTable. 
This is not the difference between the old value and new value.
   metrics.setNumVolumes(metadataManager.countRowsInTable(metadataManager
.getVolumeTable()));


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305030842
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer 
ratisServer) {
 ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true)
 .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build();
 this.executorService = HadoopExecutors.newSingleThreadExecutor(build);
+this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor();
   }
 
   /**
* Initializes the State Machine with the given server, group and storage.
* TODO: Load the latest snapshot from the file system.
 
 Review comment:
   Yes thanks for catching this. On startup, we should read the saved ratis 
snapshot index from disk. I will update the patch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305030343
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   1024 transactions is 100ms worth of edits in a busy cluster. We could set 
this as high as 1M maybe to keep more history. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305029959
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   Let's set this to a higher value. We don't need to be too aggressive about 
purging Ratis logs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305029702
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -1223,6 +1231,14 @@ public void start() throws IOException {
 
 DefaultMetricsSystem.initialize("OzoneManager");
 
+// Start Ratis services
+if (omRatisServer != null) {
 
 Review comment:
   If ratis is not enabled. This is for the non-HA code path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on issue #948: HDDS-1649. On installSnapshot notification from 
OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-512908324
 
 
   I am mostly +1 on this change. Couple of minor comments and one thing I 
requested Bharat to double check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] hanishakoneru commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
hanishakoneru commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305028836
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -1630,6 +1630,14 @@
 Byte limit for Raft's Log Worker queue.
 
   
+  
+ozone.om.ratis.log.purge.gap
+1024
+OZONE, OM, RATIS
+The minimum gap between log indices for Raft server to purge
 
 Review comment:
   No, when a snapshot is being taken, if the gap between log purges is more 
than 1024, then it will purge the logs. Snapshot frequency is not dependent on 
this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305019890
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -190,6 +197,27 @@ public TransactionContext startTransaction(
 }
   }
 
+  @Override
+  public void pause() {
+lifeCycle.transition(LifeCycle.State.PAUSING);
 
 Review comment:
   We have set the lifeCycle State here, but I don't see how this will pause 
stateMachine.
   As this state is not being used anywhere else except during initliaze and 
unpause.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on issue #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on issue #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#issuecomment-512907214
 
 
   **Question:**
   In ShouldInstallSnapshot, it calls getLatestSnapshot() from 
stateMachineStorage, as we have our own snapshot implementation in 
stateMachine, do we need to override that method to provide correct 
snapshotInfo? Or could you provide some info how this works?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r304690480
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMetrics.java
 ##
 @@ -159,15 +159,18 @@ public void decNumKeys() {
   }
 
   public void setNumVolumes(long val) {
-this.numVolumes.incr(val);
+long oldVal = this.numVolumes.value();
+this.numVolumes.incr(val - oldVal);
 
 Review comment:
   Not understood, why are we subtracting here, after reload?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305027589
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerStateMachine.java
 ##
 @@ -87,19 +92,21 @@ public OzoneManagerStateMachine(OzoneManagerRatisServer 
ratisServer) {
 ThreadFactory build = new ThreadFactoryBuilder().setDaemon(true)
 .setNameFormat("OM StateMachine ApplyTransaction Thread - %d").build();
 this.executorService = HadoopExecutors.newSingleThreadExecutor(build);
+this.installSnapshotExecutor = HadoopExecutors.newSingleThreadExecutor();
   }
 
   /**
* Initializes the State Machine with the given server, group and storage.
* TODO: Load the latest snapshot from the file system.
 
 Review comment:
   This TODO looks a little worrying. Something we need to address now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] bharatviswa504 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
bharatviswa504 commented on a change in pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r304689358
 
 

 ##
 File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java
 ##
 @@ -0,0 +1,193 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations 
under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om;
+
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.VolumeArgs;
+import org.apache.hadoop.ozone.om.helpers.OmVolumeArgs;
+import org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer;
+import org.apache.hadoop.utils.db.DBCheckpoint;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+import static org.apache.hadoop.ozone.om.TestOzoneManagerHA.createKey;
+
+/**
+ * Tests the Ratis snaphsots feature in OM.
+ */
+public class TestOMRatisSnapshots {
+
+  private MiniOzoneHAClusterImpl cluster = null;
+  private ObjectStore objectStore;
+  private OzoneConfiguration conf;
+  private String clusterId;
+  private String scmId;
+  private int numOfOMs = 3;
+  private static final long SNAPSHOT_THRESHOLD = 50;
+  private static final int LOG_PURGE_GAP = 50;
+
+  @Rule
+  public ExpectedException exception = ExpectedException.none();
+
+  @Rule
+  public Timeout timeout = new Timeout(3000_000);
+
+  /**
+   * Create a MiniDFSCluster for testing. The cluster initially has one
 
 Review comment:
   Minor: MiniOzoneCluster


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state

2019-07-18 Thread GitBox
arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot 
notification from OM leader, download checkpoint and reload OM state
URL: https://github.com/apache/hadoop/pull/948#discussion_r305026829
 
 

 ##
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##
 @@ -3150,6 +3169,195 @@ public boolean setAcl(OzoneObj obj, List 
acls) throws IOException {
 }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+if (omSnapshotProvider == null) {
+  LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+  "nodes.");
+  return null;
+}
+
+DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
+
+// Check if current ratis log index is smaller than the downloaded
+// snapshot index. If yes, proceed by stopping the ratis server so that
+// the OM state can be re-initialized. If no, then do not proceed with
+// installSnapshot.
+long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+if (checkpointSnapshotIndex <= lastAppliedIndex) {
+  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+  "applied index: {} is greater than or equal to the checkpoint's " +
+  "snapshot index: {}", leaderId, lastAppliedIndex,
+  checkpointSnapshotIndex);
+  return null;
+}
+
+// Pause the State Machine so that no new transactions can be applied.
+// This action also clears the OM Double Buffer so that if there are any
+// pending transactions in the buffer, they are discarded.
+// TODO: The Ratis server should also be paused here. This is required
+//  because a leader election might happen while the snapshot
+//  installation is in progress and the new leader might start sending
+//  append log entries to the ratis server.
+omRatisServer.getOmStateMachine().pause();
+
+try {
+  replaceOMDBWithCheckpoint(lastAppliedIndex, omDBcheckpoint);
+} catch (Exception e) {
+  LOG.error("OM DB checkpoint replacement with new downloaded checkpoint " 
+
+  "failed.", e);
+  return null;
+}
+
+// Reload the OM DB store with the new checkpoint.
+// Restart (unpause) the state machine and update its last applied index
+// to the installed checkpoint's snapshot index.
+try {
+  reloadOMState();
+  omRatisServer.getOmStateMachine().unpause(checkpointSnapshotIndex);
+} catch (IOException e) {
+  LOG.error("Failed to reload OM state with new DB checkpoint.", e);
+  return null;
+}
+
+// TODO: We should only return the snpashotIndex to the leader.
+//  Should be fixed after RATIS-586
+TermIndex newTermIndex = TermIndex.newTermIndex(0,
+checkpointSnapshotIndex);
+
+return newTermIndex;
+  }
+
+  /**
+   * Download the latest OM DB checkpoint from the leader OM.
+   * @param leaderId OMNodeID of the leader OM node.
+   * @return latest DB checkpoint from leader OM.
+   */
+  private DBCheckpoint getDBCheckpointFromLeader(String leaderId) {
+LOG.info("Downloading checkpoint from leader OM {} and reloading state " +
+"from the checkpoint.", leaderId);
+
+try {
+  return omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+} catch (IOException e) {
+  LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+}
+return null;
+  }
+
+  /**
+   * Replace the current OM DB with the new DB checkpoint.
+   * @param lastAppliedIndex the last applied index in the current OM DB.
+   * @param omDBcheckpoint the new DB checkpoint
+   * @throws Exception
+   */
+  void replaceOMDBWithCheckpoint(long lastAppliedIndex,
+  DBCheckpoint omDBcheckpoint) throws Exception {
+// Stop the DB first
+DBStore store = metadataManager.getStore();
+store.close();
+
+// Take a backup of the current DB
+File db = store.getDbLocation();
+String dbBackupName = OzoneConsts.OM_DB_BACKUP_PREFIX +
+lastAppliedIndex + "_" + System.currentTimeMillis();
+File dbBackup = new File(db.getParentFile(), dbBackupName);
+
+try {
+  Files.move(db.toPath(), dbBackup.toPath());
+} catch (IOException e) {
+  LOG.error("Failed to create a backup of the current DB. Aborting " +
+  "snapshot installation.");
+  

  1   2   >