[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=306867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-306867
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 05/Sep/19 03:39
Start Date: 05/Sep/19 03:39
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 306867)
Time Spent: 5h 50m  (was: 5h 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=306866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-306866
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 05/Sep/19 03:39
Start Date: 05/Sep/19 03:39
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-528183034
 
 
   @hgadre  Thanks for the contribution. All others thanks for the review. I 
have committed this patch to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 306866)
Time Spent: 5h 50m  (was: 5h 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=304505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-304505
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 30/Aug/19 18:30
Start Date: 30/Aug/19 18:30
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r319630085
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 304505)
Time Spent: 5h 40m  (was: 5.5h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=301895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301895
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 27/Aug/19 11:30
Start Date: 27/Aug/19 11:30
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-525260061
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 44 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 24 | Maven dependency ordering for branch |
   | +1 | mvninstall | 636 | trunk passed |
   | +1 | compile | 416 | trunk passed |
   | +1 | checkstyle | 80 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 930 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 177 | trunk passed |
   | 0 | spotbugs | 464 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 670 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 35 | Maven dependency ordering for patch |
   | +1 | mvninstall | 562 | the patch passed |
   | +1 | compile | 388 | the patch passed |
   | +1 | javac | 388 | the patch passed |
   | +1 | checkstyle | 82 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 681 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 171 | the patch passed |
   | +1 | findbugs | 663 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 320 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2370 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 8468 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerHandler
 |
   |   | hadoop.ozone.client.rpc.TestWatchForCommit |
   |   | hadoop.ozone.TestOzoneConfigurationFields |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 7e27e49f0d2b 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 3329257 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/8/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/8/testReport/ |
   | Max. process+thread count | 5326 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/8/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301895)
Time Spent: 5.5h  (was: 5h 20m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Background 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=297917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297917
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 20/Aug/19 14:26
Start Date: 20/Aug/19 14:26
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-523039889
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 71 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for branch |
   | -1 | mvninstall | 151 | hadoop-ozone in trunk failed. |
   | -1 | compile | 54 | hadoop-ozone in trunk failed. |
   | +1 | checkstyle | 55 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 880 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 176 | trunk passed |
   | 0 | spotbugs | 252 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | -1 | findbugs | 129 | hadoop-ozone in trunk failed. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 31 | Maven dependency ordering for patch |
   | -1 | mvninstall | 159 | hadoop-ozone in the patch failed. |
   | -1 | compile | 60 | hadoop-ozone in the patch failed. |
   | -1 | javac | 60 | hadoop-ozone in the patch failed. |
   | +1 | checkstyle | 65 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 681 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 173 | the patch passed |
   | -1 | findbugs | 120 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | +1 | unit | 348 | hadoop-hdds in the patch passed. |
   | -1 | unit | 121 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 33 | The patch does not generate ASF License warnings. |
   | | | 4429 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux a13bd4bee649 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / bd92462 |
   | Default Java | 1.8.0_222 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/branch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/branch-compile-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/branch-findbugs-hadoop-ozone.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/patch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | javac | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/patch-findbugs-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/testReport/ |
   | Max. process+thread count | 525 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/6/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=297438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297438
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 19/Aug/19 21:48
Start Date: 19/Aug/19 21:48
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r315426622
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerMetadataScanner.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.ozoneimpl;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.net.ntp.TimeStamp;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * This class is responsible to perform metadata verification of the
+ * containers.
+ */
+public class ContainerMetadataScanner extends Thread {
+  public static final Logger LOG =
+  LoggerFactory.getLogger(ContainerMetadataScanner.class);
+
+  private final ContainerController controller;
+  /**
+   * True if the thread is stopping.
+   * Protected by this object's lock.
+   */
+  private boolean stopping = false;
+
+  public ContainerMetadataScanner(ContainerController controller) {
+this.controller = controller;
+setName("ContainerMetadataScanner");
+setDaemon(true);
+  }
+
+  @Override
+  public void run() {
+/**
+ * the outer daemon loop exits on down()
+ */
+LOG.info("Background ContainerMetadataScanner starting up");
+while (!stopping) {
+  scrub();
+  if (!stopping) {
+try {
+  Thread.sleep(30); /* 5 min between scans */
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 297438)
Time Spent: 4h 50m  (was: 4h 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=297441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297441
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 19/Aug/19 21:48
Start Date: 19/Aug/19 21:48
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r315426753
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java
 ##
 @@ -68,11 +68,16 @@
   public static final String HDDS_CONTAINERSCRUB_ENABLED =
   "hdds.containerscrub.enabled";
   public static final boolean HDDS_CONTAINERSCRUB_ENABLED_DEFAULT = false;
+
   public static final boolean HDDS_SCM_SAFEMODE_ENABLED_DEFAULT = true;
   public static final String HDDS_SCM_SAFEMODE_MIN_DATANODE =
   "hdds.scm.safemode.min.datanode";
   public static final int HDDS_SCM_SAFEMODE_MIN_DATANODE_DEFAULT = 1;
 
+  public static final String HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND =
+  "hdds.container.scanner.volume.bytes.per.second";
 
 Review comment:
   Updated the patch to use configuration based APIs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 297441)
Time Spent: 5h  (was: 4h 50m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=297437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297437
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 19/Aug/19 21:47
Start Date: 19/Aug/19 21:47
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r315426293
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
-  private void iterateBlockDB(ReferenceCountedDB db)
-  throws IOException {
-Preconditions.checkState(db != null);
-
-// get "normal" keys from the Block DB
-try(KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
-new File(onDiskContainerData.getContainerPath( {
-
-  // ensure there is a chunk file for each key in the DB
-  while (kvIter.hasNext()) {
+  while(kvIter.hasNext()) {
 BlockData block = kvIter.nextBlock();
-
-List chunkInfoList = block.getChunks();
-for (ContainerProtos.ChunkInfo chunk : chunkInfoList) {
-  File chunkFile;
-  chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
+for(ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+  File chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
   ChunkInfo.getFromProtoBuf(chunk));
-
   if (!chunkFile.exists()) {
 // concurrent mutation in Block DB? lookup the block again.
 byte[] bdata = db.getStore().get(
 Longs.toByteArray(block.getBlockID().getLocalID()));
-if (bdata == null) {
-  LOG.trace("concurrency with delete, ignoring deleted block");
-  break; // skip to next block from kvIter
-} else {
-  String errorStr = "Missing chunk file "
-  + chunkFile.getAbsolutePath();
-  throw new IOException(errorStr);
+if (bdata != null) {
+  throw new IOException("Missing chunk file "
+  + chunkFile.getAbsolutePath());
+}
+  } else if (chunk.getChecksumData().getType()
+  != ContainerProtos.ChecksumType.NONE){
+int length = chunk.getChecksumData().getChecksumsList().size();
+ChecksumData cData = new ChecksumData(
+chunk.getChecksumData().getType(),
+chunk.getChecksumData().getBytesPerChecksum(),
+chunk.getChecksumData().getChecksumsList());
+long bytesRead = 0;
+byte[] buffer = new byte[cData.getBytesPerChecksum()];
+try (InputStream fs = new FileInputStream(chunkFile)) {
+  int i = 0, v = 0;
+  for (; i < length; i++) {
+v = fs.read(buffer);
+if (v == -1) {
+  break;
+}
+bytesRead += v;
+throttler.throttle(v, canceler);
+Checksum cal = new Checksum(cData.getChecksumType(),
+cData.getBytesPerChecksum());
+ByteString expected = cData.getChecksums().get(i);
+ByteString actual = cal.computeChecksum(buffer)
+.getChecksums().get(0);
+if (!Arrays.equals(expected.toByteArray(),
+actual.toByteArray())) {
+  throw new OzoneChecksumException(String
+  .format("Inconsistent read for chunk=%s len=%d expected" 
+
+  " checksum %s actual checksum %s",
+  chunk.getChunkName(), chunk.getLen(),
+  Arrays.toString(expected.toByteArray()),
+  Arrays.toString(actual.toByteArray(;
+}
+
+  }
+  if (v == -1 && i < length) {
+throw new OzoneChecksumException(String
+.format("Inconsistent read for chunk=%s expected length=%d"
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=297436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297436
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 19/Aug/19 21:44
Start Date: 19/Aug/19 21:44
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r315425380
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
 ##
 @@ -120,10 +133,70 @@ public TestKeyValueContainerCheck(String metadataImpl) {
 container.close();
 
 // next run checks on a Closed Container
-valid = kvCheck.fullCheck();
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+c.getBandwidthPerVolume()), null);
 assertTrue(valid);
   }
 
+  /**
+   * Sanity test, when there are corruptions induced.
+   * @throws Exception
+   */
+  @Test
+  public void testKeyValueContainerCheckCorruption() throws Exception {
+long containerID = 102;
+int deletedBlocks = 1;
+int normalBlocks = 3;
+int chunksPerBlock = 4;
+boolean valid = false;
+ContainerScrubberConfiguration sc = conf.getObject(
+ContainerScrubberConfiguration.class);
+
+// test Closed Container
+createContainerWithBlocks(containerID, normalBlocks, deletedBlocks, 65536,
+chunksPerBlock);
+File chunksPath = new File(containerData.getChunksPath());
+assertTrue(chunksPath.listFiles().length
+== (deletedBlocks + normalBlocks) * chunksPerBlock);
+
+container.close();
+
+KeyValueContainerCheck kvCheck =
+new KeyValueContainerCheck(containerData.getMetadataPath(), conf,
+containerID);
+
+File metaDir = new File(containerData.getMetadataPath());
+File dbFile = KeyValueContainerLocationUtil
+.getContainerDBFile(metaDir, containerID);
+containerData.setDbFile(dbFile);
+try(ReferenceCountedDB db =
+BlockUtils.getDB(containerData, conf);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(containerData.getContainerPath( {
+  BlockData block = kvIter.nextBlock();
+  assertTrue(!block.getChunks().isEmpty());
+  ContainerProtos.ChunkInfo c = block.getChunks().get(0);
+  File chunkFile = ChunkUtils.getChunkFile(containerData,
+  ChunkInfo.getFromProtoBuf(c));
+  long length = chunkFile.length();
+  assertTrue(length > 0);
+  // forcefully truncate the file to induce failure.
+  try (RandomAccessFile file = new RandomAccessFile(chunkFile, "rws")) {
+file.setLength(length / 2);
+  }
+  assertEquals(length/2, chunkFile.length());
+}
+
+// metadata check should pass.
+valid = kvCheck.fastCheck();
+assertTrue(valid);
+
+// checksum validation should fail.
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+sc.getBandwidthPerVolume()), null);
+assertFalse(valid);
+  }
+
   /**
* Creates a container with normal and deleted blocks.
* First it will insert normal blocks, and then it will insert
 
 Review comment:
   Not sure I am following you. Can you elaborate which part do you find 
misleading? This function was present before this patch ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 297436)
Time Spent: 4.5h  (was: 4h 20m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=296479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296479
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 16/Aug/19 17:59
Start Date: 16/Aug/19 17:59
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-522098277
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 44 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 68 | Maven dependency ordering for branch |
   | +1 | mvninstall | 635 | trunk passed |
   | +1 | compile | 419 | trunk passed |
   | +1 | checkstyle | 66 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 830 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 154 | trunk passed |
   | 0 | spotbugs | 465 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 658 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 32 | Maven dependency ordering for patch |
   | +1 | mvninstall | 563 | the patch passed |
   | +1 | compile | 345 | the patch passed |
   | +1 | javac | 345 | the patch passed |
   | +1 | checkstyle | 62 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 636 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 153 | the patch passed |
   | +1 | findbugs | 619 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 274 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2274 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 45 | The patch does not generate ASF License warnings. |
   | | | 8019 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.TestOzoneConfigurationFields |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 4033a5d040c1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9a1d8cf |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/5/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/5/testReport/ |
   | Max. process+thread count | 4312 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/5/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296479)
Time Spent: 4h 20m  (was: 4h 10m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-08 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=291221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-291221
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 08/Aug/19 12:15
Start Date: 08/Aug/19 12:15
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-519493244
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 66 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for branch |
   | +1 | mvninstall | 653 | trunk passed |
   | +1 | compile | 398 | trunk passed |
   | +1 | checkstyle | 67 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 883 | branch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 46 | hadoop-ozone in trunk failed. |
   | 0 | spotbugs | 432 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 657 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 31 | Maven dependency ordering for patch |
   | +1 | mvninstall | 602 | the patch passed |
   | +1 | compile | 395 | the patch passed |
   | +1 | javac | 395 | the patch passed |
   | +1 | checkstyle | 78 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 1 | The patch has no whitespace issues. |
   | +1 | shadedclient | 670 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 88 | hadoop-ozone generated 12 new + 1 unchanged - 0 fixed 
= 13 total (was 1) |
   | +1 | findbugs | 764 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 252 | hadoop-hdds in the patch failed. |
   | -1 | unit | 2290 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 40 | The patch does not generate ASF License warnings. |
   | | | 8288 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerCommandHandler
 |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.TestOzoneConfigurationFields |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestReadRetries |
   |   | hadoop.ozone.om.TestKeyManagerImpl |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux e800dfb0faa2 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 397a563 |
   | Default Java | 1.8.0_212 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/artifact/out/branch-javadoc-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/artifact/out/diff-javadoc-javadoc-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/testReport/ |
   | Max. process+thread count | 4992 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/4/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=290977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290977
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 08/Aug/19 03:53
Start Date: 08/Aug/19 03:53
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-519352255
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 38 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 25 | Maven dependency ordering for branch |
   | +1 | mvninstall | 617 | trunk passed |
   | +1 | compile | 412 | trunk passed |
   | +1 | checkstyle | 72 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 918 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 164 | trunk passed |
   | 0 | spotbugs | 478 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 685 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 32 | Maven dependency ordering for patch |
   | +1 | mvninstall | 554 | the patch passed |
   | +1 | compile | 386 | the patch passed |
   | +1 | javac | 386 | the patch passed |
   | +1 | checkstyle | 77 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 742 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 163 | the patch passed |
   | +1 | findbugs | 658 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 336 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2005 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 43 | The patch does not generate ASF License warnings. |
   | | | 8091 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.TestOzoneConfigurationFields |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.om.TestKeyManagerImpl |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 95e336b2f0ac 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 70b4617 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/3/testReport/ |
   | Max. process+thread count | 5301 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290977)
Time Spent: 4h  (was: 3h 50m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=288049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288049
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 02/Aug/19 16:32
Start Date: 02/Aug/19 16:32
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-517765490
 
 
   Thanks @hgadre for working on this. The patch looks good to me with few 
minor comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288049)
Time Spent: 3h 50m  (was: 3h 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=288047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288047
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 02/Aug/19 16:30
Start Date: 02/Aug/19 16:30
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r310207626
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
-  private void iterateBlockDB(ReferenceCountedDB db)
-  throws IOException {
-Preconditions.checkState(db != null);
-
-// get "normal" keys from the Block DB
-try(KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
-new File(onDiskContainerData.getContainerPath( {
-
-  // ensure there is a chunk file for each key in the DB
-  while (kvIter.hasNext()) {
+  while(kvIter.hasNext()) {
 BlockData block = kvIter.nextBlock();
-
-List chunkInfoList = block.getChunks();
-for (ContainerProtos.ChunkInfo chunk : chunkInfoList) {
-  File chunkFile;
-  chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
+for(ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+  File chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
   ChunkInfo.getFromProtoBuf(chunk));
-
   if (!chunkFile.exists()) {
 // concurrent mutation in Block DB? lookup the block again.
 byte[] bdata = db.getStore().get(
 Longs.toByteArray(block.getBlockID().getLocalID()));
-if (bdata == null) {
-  LOG.trace("concurrency with delete, ignoring deleted block");
-  break; // skip to next block from kvIter
-} else {
-  String errorStr = "Missing chunk file "
-  + chunkFile.getAbsolutePath();
-  throw new IOException(errorStr);
+if (bdata != null) {
+  throw new IOException("Missing chunk file "
+  + chunkFile.getAbsolutePath());
+}
+  } else if (chunk.getChecksumData().getType()
+  != ContainerProtos.ChecksumType.NONE){
+int length = chunk.getChecksumData().getChecksumsList().size();
+ChecksumData cData = new ChecksumData(
+chunk.getChecksumData().getType(),
+chunk.getChecksumData().getBytesPerChecksum(),
+chunk.getChecksumData().getChecksumsList());
+long bytesRead = 0;
+byte[] buffer = new byte[cData.getBytesPerChecksum()];
+try (InputStream fs = new FileInputStream(chunkFile)) {
+  int i = 0, v = 0;
+  for (; i < length; i++) {
+v = fs.read(buffer);
+if (v == -1) {
+  break;
+}
+bytesRead += v;
+throttler.throttle(v, canceler);
+Checksum cal = new Checksum(cData.getChecksumType(),
+cData.getBytesPerChecksum());
+ByteString expected = cData.getChecksums().get(i);
+ByteString actual = cal.computeChecksum(buffer)
+.getChecksums().get(0);
+if (!Arrays.equals(expected.toByteArray(),
+actual.toByteArray())) {
+  throw new OzoneChecksumException(String
+  .format("Inconsistent read for chunk=%s len=%d expected" 
+
+  " checksum %s actual checksum %s",
+  chunk.getChunkName(), chunk.getLen(),
+  Arrays.toString(expected.toByteArray()),
+  Arrays.toString(actual.toByteArray(;
+}
+
+  }
+  if (v == -1 && i < length) {
+throw new OzoneChecksumException(String
+.format("Inconsistent read for chunk=%s expected length=%d"
 
 Review comment:
   It might be a good idea to log the blockId as well in the exception msg.
 

This is an automated message from 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=288033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288033
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 02/Aug/19 16:22
Start Date: 02/Aug/19 16:22
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r310204664
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
 ##
 @@ -120,10 +133,70 @@ public TestKeyValueContainerCheck(String metadataImpl) {
 container.close();
 
 // next run checks on a Closed Container
-valid = kvCheck.fullCheck();
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+c.getBandwidthPerVolume()), null);
 assertTrue(valid);
   }
 
+  /**
+   * Sanity test, when there are corruptions induced.
+   * @throws Exception
+   */
+  @Test
+  public void testKeyValueContainerCheckCorruption() throws Exception {
+long containerID = 102;
+int deletedBlocks = 1;
+int normalBlocks = 3;
+int chunksPerBlock = 4;
+boolean valid = false;
+ContainerScrubberConfiguration sc = conf.getObject(
+ContainerScrubberConfiguration.class);
+
+// test Closed Container
+createContainerWithBlocks(containerID, normalBlocks, deletedBlocks, 65536,
+chunksPerBlock);
+File chunksPath = new File(containerData.getChunksPath());
+assertTrue(chunksPath.listFiles().length
+== (deletedBlocks + normalBlocks) * chunksPerBlock);
+
+container.close();
+
+KeyValueContainerCheck kvCheck =
+new KeyValueContainerCheck(containerData.getMetadataPath(), conf,
+containerID);
+
+File metaDir = new File(containerData.getMetadataPath());
+File dbFile = KeyValueContainerLocationUtil
+.getContainerDBFile(metaDir, containerID);
+containerData.setDbFile(dbFile);
+try(ReferenceCountedDB db =
+BlockUtils.getDB(containerData, conf);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(containerData.getContainerPath( {
+  BlockData block = kvIter.nextBlock();
+  assertTrue(!block.getChunks().isEmpty());
+  ContainerProtos.ChunkInfo c = block.getChunks().get(0);
+  File chunkFile = ChunkUtils.getChunkFile(containerData,
+  ChunkInfo.getFromProtoBuf(c));
+  long length = chunkFile.length();
+  assertTrue(length > 0);
+  // forcefully truncate the file to induce failure.
+  try (RandomAccessFile file = new RandomAccessFile(chunkFile, "rws")) {
+file.setLength(length / 2);
+  }
+  assertEquals(length/2, chunkFile.length());
+}
+
+// metadata check should pass.
+valid = kvCheck.fastCheck();
+assertTrue(valid);
+
+// checksum validation should fail.
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+sc.getBandwidthPerVolume()), null);
+assertFalse(valid);
+  }
+
   /**
* Creates a container with normal and deleted blocks.
* First it will insert normal blocks, and then it will insert
 
 Review comment:
   Milsleading comment
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 288033)
Time Spent: 3.5h  (was: 3h 20m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-08-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=287419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-287419
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 02/Aug/19 01:17
Start Date: 02/Aug/19 01:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-517511298
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 38 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 24 | Maven dependency ordering for branch |
   | +1 | mvninstall | 596 | trunk passed |
   | +1 | compile | 380 | trunk passed |
   | +1 | checkstyle | 75 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 947 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 176 | trunk passed |
   | 0 | spotbugs | 492 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 726 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 33 | Maven dependency ordering for patch |
   | +1 | mvninstall | 587 | the patch passed |
   | +1 | compile | 394 | the patch passed |
   | +1 | javac | 394 | the patch passed |
   | +1 | checkstyle | 77 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 732 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 179 | the patch passed |
   | +1 | findbugs | 673 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 349 | hadoop-hdds in the patch passed. |
   | -1 | unit | 1679 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 47 | The patch does not generate ASF License warnings. |
   | | | 7863 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.TestOzoneConfigurationFields |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 75460ae7d374 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / b94eba9 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/2/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/2/testReport/ |
   | Max. process+thread count | 4860 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/2/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 287419)
Time Spent: 3h 20m  (was: 3h 10m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283559
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 19:28
Start Date: 26/Jul/19 19:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307881322
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
-  private void iterateBlockDB(ReferenceCountedDB db)
-  throws IOException {
-Preconditions.checkState(db != null);
-
-// get "normal" keys from the Block DB
-try(KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
-new File(onDiskContainerData.getContainerPath( {
-
-  // ensure there is a chunk file for each key in the DB
-  while (kvIter.hasNext()) {
+  while(kvIter.hasNext()) {
 BlockData block = kvIter.nextBlock();
-
-List chunkInfoList = block.getChunks();
-for (ContainerProtos.ChunkInfo chunk : chunkInfoList) {
-  File chunkFile;
-  chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
+for(ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+  File chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
   ChunkInfo.getFromProtoBuf(chunk));
-
   if (!chunkFile.exists()) {
 // concurrent mutation in Block DB? lookup the block again.
 byte[] bdata = db.getStore().get(
 Longs.toByteArray(block.getBlockID().getLocalID()));
-if (bdata == null) {
-  LOG.trace("concurrency with delete, ignoring deleted block");
-  break; // skip to next block from kvIter
-} else {
-  String errorStr = "Missing chunk file "
-  + chunkFile.getAbsolutePath();
-  throw new IOException(errorStr);
+if (bdata != null) {
+  throw new IOException("Missing chunk file "
+  + chunkFile.getAbsolutePath());
+}
+  } else if (chunk.getChecksumData().getType()
+  != ContainerProtos.ChecksumType.NONE){
 
 Review comment:
   makes sense.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283559)
Time Spent: 3h 10m  (was: 3h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283499
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 17:28
Start Date: 26/Jul/19 17:28
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307839297
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java
 ##
 @@ -68,11 +68,16 @@
   public static final String HDDS_CONTAINERSCRUB_ENABLED =
   "hdds.containerscrub.enabled";
   public static final boolean HDDS_CONTAINERSCRUB_ENABLED_DEFAULT = false;
+
   public static final boolean HDDS_SCM_SAFEMODE_ENABLED_DEFAULT = true;
   public static final String HDDS_SCM_SAFEMODE_MIN_DATANODE =
   "hdds.scm.safemode.min.datanode";
   public static final int HDDS_SCM_SAFEMODE_MIN_DATANODE_DEFAULT = 1;
 
+  public static final String HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND =
+  "hdds.container.scanner.volume.bytes.per.second";
 
 Review comment:
   @swagle the property name is loosely modeled after HDFS. So i think we can 
keep it that way.
   @anuengineer thanks for the info. Let me refactor the logic here to use 
configuration based API.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283499)
Time Spent: 2h 50m  (was: 2h 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283500
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 17:29
Start Date: 26/Jul/19 17:29
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307839423
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java
 ##
 @@ -68,11 +68,16 @@
   public static final String HDDS_CONTAINERSCRUB_ENABLED =
   "hdds.containerscrub.enabled";
   public static final boolean HDDS_CONTAINERSCRUB_ENABLED_DEFAULT = false;
+
   public static final boolean HDDS_SCM_SAFEMODE_ENABLED_DEFAULT = true;
   public static final String HDDS_SCM_SAFEMODE_MIN_DATANODE =
   "hdds.scm.safemode.min.datanode";
   public static final int HDDS_SCM_SAFEMODE_MIN_DATANODE_DEFAULT = 1;
 
+  public static final String HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND =
+  "hdds.container.scanner.volume.bytes.per.second";
+  public static final long
+  HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT = 1048576L;
 
 
 Review comment:
   ok. let me use that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283500)
Time Spent: 3h  (was: 2h 50m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283498
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 17:25
Start Date: 26/Jul/19 17:25
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307838235
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerMetadataScanner.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.ozoneimpl;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.net.ntp.TimeStamp;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * This class is responsible to perform metadata verification of the
+ * containers.
+ */
+public class ContainerMetadataScanner extends Thread {
+  public static final Logger LOG =
+  LoggerFactory.getLogger(ContainerMetadataScanner.class);
+
+  private final ContainerController controller;
+  /**
+   * True if the thread is stopping.
+   * Protected by this object's lock.
+   */
+  private boolean stopping = false;
+
+  public ContainerMetadataScanner(ContainerController controller) {
+this.controller = controller;
+setName("ContainerMetadataScanner");
+setDaemon(true);
+  }
+
+  @Override
+  public void run() {
+/**
+ * the outer daemon loop exits on down()
+ */
+LOG.info("Background ContainerMetadataScanner starting up");
+while (!stopping) {
+  scrub();
+  if (!stopping) {
+try {
+  Thread.sleep(30); /* 5 min between scans */
 
 Review comment:
   This logic was present in ContainerScrubber.java before this patch. I just 
refactored it. Let me make it configurable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283498)
Time Spent: 2h 40m  (was: 2.5h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283493
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 17:18
Start Date: 26/Jul/19 17:18
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307835754
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerDataScanner.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.ozoneimpl;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+import org.apache.hadoop.hdfs.util.Canceler;
+import org.apache.hadoop.hdfs.util.DataTransferThrottler;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * VolumeScanner scans a single volume.  Each VolumeScanner has its own thread.
+ * They are all managed by the DataNode's BlockScanner.
+ */
+public class ContainerDataScanner extends Thread {
+  public static final Logger LOG =
+  LoggerFactory.getLogger(ContainerDataScanner.class);
+
+  /**
+   * The volume that we're scanning.
+   */
+  private final HddsVolume volume;
+  private final ContainerController controller;
+  private final DataTransferThrottler throttler;
+  private final Canceler canceler;
+
+  /**
+   * True if the thread is stopping.
+   * Protected by this object's lock.
+   */
+  private volatile boolean stopping = false;
+
+
+  public ContainerDataScanner(ContainerController controller,
+  HddsVolume volume, long bytesPerSec) {
+this.controller = controller;
+this.volume = volume;
+this.throttler = new DataTransferThrottler(bytesPerSec);
+this.canceler = new Canceler();
+setName("ContainerDataScanner(" + volume + ")");
+setDaemon(true);
+  }
+
+  @Override
+  public void run() {
+LOG.trace("{}: thread starting.", this);
+try {
+  while (!stopping) {
+Iterator itr = controller.getContainers(volume);
+while (!stopping && itr.hasNext()) {
+  Container c = itr.next();
+  try {
+if (c.shouldScanData()) {
+  if(!c.scanData(throttler, canceler)) {
+controller.markContainerUnhealthy(
+c.getContainerData().getContainerID());
+  }
+}
+  } catch (IOException ex) {
+long containerId = c.getContainerData().getContainerID();
+LOG.warn("Unexpected exception while scanning container "
++ containerId, ex);
 
 Review comment:
   Yes we do mark the container as unhealthy in case of I/O errors. But there 
are some cases where we can not mark a container as unhealthy (e.g. when the 
rocksdb metadata is deleted or corrupted). In that case we just send an ICR to 
SCM. Here is relevant code snippet - 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L900-L915
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283493)
Time Spent: 2h 20m  (was: 2h 10m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>   

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283494
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 17:18
Start Date: 26/Jul/19 17:18
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307835843
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
 ##
 @@ -120,10 +132,70 @@ public TestKeyValueContainerCheck(String metadataImpl) {
 container.close();
 
 // next run checks on a Closed Container
-valid = kvCheck.fullCheck();
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+HddsConfigKeys.HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT),
+null);
 assertTrue(valid);
   }
 
+  /**
+   * Sanity test, when there are corruptions induced.
+   * @throws Exception
+   */
+  @Test
+  public void testKeyValueContainerCheckCorruption() throws Exception {
 
 Review comment:
   sure will do.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283494)
Time Spent: 2.5h  (was: 2h 20m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283488
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 17:13
Start Date: 26/Jul/19 17:13
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307833921
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
-  private void iterateBlockDB(ReferenceCountedDB db)
-  throws IOException {
-Preconditions.checkState(db != null);
-
-// get "normal" keys from the Block DB
-try(KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
-new File(onDiskContainerData.getContainerPath( {
-
-  // ensure there is a chunk file for each key in the DB
-  while (kvIter.hasNext()) {
+  while(kvIter.hasNext()) {
 BlockData block = kvIter.nextBlock();
-
-List chunkInfoList = block.getChunks();
-for (ContainerProtos.ChunkInfo chunk : chunkInfoList) {
-  File chunkFile;
-  chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
+for(ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+  File chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
   ChunkInfo.getFromProtoBuf(chunk));
-
   if (!chunkFile.exists()) {
 // concurrent mutation in Block DB? lookup the block again.
 byte[] bdata = db.getStore().get(
 Longs.toByteArray(block.getBlockID().getLocalID()));
-if (bdata == null) {
-  LOG.trace("concurrency with delete, ignoring deleted block");
-  break; // skip to next block from kvIter
-} else {
-  String errorStr = "Missing chunk file "
-  + chunkFile.getAbsolutePath();
-  throw new IOException(errorStr);
+if (bdata != null) {
+  throw new IOException("Missing chunk file "
+  + chunkFile.getAbsolutePath());
+}
+  } else if (chunk.getChecksumData().getType()
+  != ContainerProtos.ChecksumType.NONE){
 
 Review comment:
   Ok let me refactor. Regarding second question - i want to avoid disk I/O 
when we know that we don't have checksum to verify against. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283488)
Time Spent: 2h 10m  (was: 2h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283351
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 26/Jul/19 13:18
Start Date: 26/Jul/19 13:18
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#issuecomment-515447538
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 94 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 60 | Maven dependency ordering for branch |
   | +1 | mvninstall | 658 | trunk passed |
   | +1 | compile | 400 | trunk passed |
   | +1 | checkstyle | 75 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 899 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 175 | trunk passed |
   | 0 | spotbugs | 489 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 733 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 32 | Maven dependency ordering for patch |
   | +1 | mvninstall | 573 | the patch passed |
   | +1 | compile | 368 | the patch passed |
   | +1 | javac | 368 | the patch passed |
   | +1 | checkstyle | 74 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 722 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 176 | the patch passed |
   | -1 | findbugs | 236 | hadoop-hdds generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0) |
   ||| _ Other Tests _ |
   | -1 | unit | 373 | hadoop-hdds in the patch failed. |
   | -1 | unit | 2535 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 40 | The patch does not generate ASF License warnings. |
   | | | 8849 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | FindBugs | module:hadoop-hdds |
   |  |  Dead store to startTime in 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerMetadataScanner.scrub()  
At 
ContainerMetadataScanner.java:org.apache.hadoop.ozone.container.ozoneimpl.ContainerMetadataScanner.scrub()
  At ContainerMetadataScanner.java:[line 74] |
   | Failed junit tests | hadoop.ozone.TestOzoneConfigurationFields |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.ozone.om.TestScmSafeMode |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1154 |
   | JIRA Issue | HDDS-1200 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux a90351c712ee 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / aebac6d |
   | Default Java | 1.8.0_212 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/1/artifact/out/new-findbugs-hadoop-hdds.html
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/1/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/1/testReport/ |
   | Max. process+thread count | 4495 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/container-service 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1154/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an 

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283005
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:29
Start Date: 25/Jul/19 23:29
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307539277
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
 ##
 @@ -120,10 +132,70 @@ public TestKeyValueContainerCheck(String metadataImpl) {
 container.close();
 
 // next run checks on a Closed Container
-valid = kvCheck.fullCheck();
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+HddsConfigKeys.HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT),
+null);
 assertTrue(valid);
   }
 
+  /**
+   * Sanity test, when there are corruptions induced.
+   * @throws Exception
+   */
+  @Test
+  public void testKeyValueContainerCheckCorruption() throws Exception {
 
 Review comment:
   If you add a label called ozone, then this pull request will be processed by 
Jenkins. Just FYI. I have done that for this patch.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283005)
Time Spent: 1h 50m  (was: 1h 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283004
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:29
Start Date: 25/Jul/19 23:29
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307539083
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
 ##
 @@ -120,10 +132,70 @@ public TestKeyValueContainerCheck(String metadataImpl) {
 container.close();
 
 // next run checks on a Closed Container
-valid = kvCheck.fullCheck();
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+HddsConfigKeys.HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT),
+null);
 assertTrue(valid);
   }
 
+  /**
+   * Sanity test, when there are corruptions induced.
+   * @throws Exception
+   */
+  @Test
+  public void testKeyValueContainerCheckCorruption() throws Exception {
 
 Review comment:
   /label ozone
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283004)
Time Spent: 1h 40m  (was: 1.5h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=282998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282998
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:28
Start Date: 25/Jul/19 23:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307538835
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
 ##
 @@ -120,10 +132,70 @@ public TestKeyValueContainerCheck(String metadataImpl) {
 container.close();
 
 // next run checks on a Closed Container
-valid = kvCheck.fullCheck();
+valid = kvCheck.fullCheck(new DataTransferThrottler(
+HddsConfigKeys.HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT),
+null);
 assertTrue(valid);
   }
 
+  /**
+   * Sanity test, when there are corruptions induced.
+   * @throws Exception
+   */
+  @Test
+  public void testKeyValueContainerCheckCorruption() throws Exception {
 
 Review comment:
   you can run this test under memory profiler to make sure there are no leaks. 
Thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 282998)
Time Spent: 50m  (was: 40m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283000
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:28
Start Date: 25/Jul/19 23:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307538039
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
-  private void iterateBlockDB(ReferenceCountedDB db)
-  throws IOException {
-Preconditions.checkState(db != null);
-
-// get "normal" keys from the Block DB
-try(KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
-new File(onDiskContainerData.getContainerPath( {
-
-  // ensure there is a chunk file for each key in the DB
-  while (kvIter.hasNext()) {
+  while(kvIter.hasNext()) {
 BlockData block = kvIter.nextBlock();
-
-List chunkInfoList = block.getChunks();
-for (ContainerProtos.ChunkInfo chunk : chunkInfoList) {
-  File chunkFile;
-  chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
+for(ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+  File chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
   ChunkInfo.getFromProtoBuf(chunk));
-
   if (!chunkFile.exists()) {
 // concurrent mutation in Block DB? lookup the block again.
 byte[] bdata = db.getStore().get(
 Longs.toByteArray(block.getBlockID().getLocalID()));
-if (bdata == null) {
-  LOG.trace("concurrency with delete, ignoring deleted block");
-  break; // skip to next block from kvIter
-} else {
-  String errorStr = "Missing chunk file "
-  + chunkFile.getAbsolutePath();
-  throw new IOException(errorStr);
+if (bdata != null) {
+  throw new IOException("Missing chunk file "
+  + chunkFile.getAbsolutePath());
+}
+  } else if (chunk.getChecksumData().getType()
+  != ContainerProtos.ChecksumType.NONE){
 
 Review comment:
   Also what happens if CheckSum type is none ? we silently skip? That makes 
sense, just asking to make sure that my understanding is correct.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283000)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283001
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:28
Start Date: 25/Jul/19 23:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307538592
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerDataScanner.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.ozoneimpl;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+import org.apache.hadoop.hdfs.util.Canceler;
+import org.apache.hadoop.hdfs.util.DataTransferThrottler;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.apache.hadoop.ozone.container.common.volume.HddsVolume;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * VolumeScanner scans a single volume.  Each VolumeScanner has its own thread.
+ * They are all managed by the DataNode's BlockScanner.
+ */
+public class ContainerDataScanner extends Thread {
+  public static final Logger LOG =
+  LoggerFactory.getLogger(ContainerDataScanner.class);
+
+  /**
+   * The volume that we're scanning.
+   */
+  private final HddsVolume volume;
+  private final ContainerController controller;
+  private final DataTransferThrottler throttler;
+  private final Canceler canceler;
+
+  /**
+   * True if the thread is stopping.
+   * Protected by this object's lock.
+   */
+  private volatile boolean stopping = false;
+
+
+  public ContainerDataScanner(ContainerController controller,
+  HddsVolume volume, long bytesPerSec) {
+this.controller = controller;
+this.volume = volume;
+this.throttler = new DataTransferThrottler(bytesPerSec);
+this.canceler = new Canceler();
+setName("ContainerDataScanner(" + volume + ")");
+setDaemon(true);
+  }
+
+  @Override
+  public void run() {
+LOG.trace("{}: thread starting.", this);
+try {
+  while (!stopping) {
+Iterator itr = controller.getContainers(volume);
+while (!stopping && itr.hasNext()) {
+  Container c = itr.next();
+  try {
+if (c.shouldScanData()) {
+  if(!c.scanData(throttler, canceler)) {
+controller.markContainerUnhealthy(
+c.getContainerData().getContainerID());
+  }
+}
+  } catch (IOException ex) {
+long containerId = c.getContainerData().getContainerID();
+LOG.warn("Unexpected exception while scanning container "
++ containerId, ex);
 
 Review comment:
   If we are not able to read the container, should we mark the container as 
unhealthy ? even if we got an exception ? I am not sure if all exceptions do 
mean contianer is unhealthy, but for some exceptions; yes it is unhealthy.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283001)
Time Spent: 1h 10m  (was: 1h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>

[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283003
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:28
Start Date: 25/Jul/19 23:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307537516
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
 Review comment:
   Can you please run this in a profiler mode -- and make sure there are no 
memory leaks in this code path. Nothing to do with your patch at all. Just that 
we have found some issues here earlier. Just run it under something like 
VisualVM and see if we release all memory when get out of the loop.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283003)
Time Spent: 1.5h  (was: 1h 20m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=283002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283002
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:28
Start Date: 25/Jul/19 23:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307536162
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java
 ##
 @@ -68,11 +68,16 @@
   public static final String HDDS_CONTAINERSCRUB_ENABLED =
   "hdds.containerscrub.enabled";
   public static final boolean HDDS_CONTAINERSCRUB_ENABLED_DEFAULT = false;
+
   public static final boolean HDDS_SCM_SAFEMODE_ENABLED_DEFAULT = true;
   public static final String HDDS_SCM_SAFEMODE_MIN_DATANODE =
   "hdds.scm.safemode.min.datanode";
   public static final int HDDS_SCM_SAFEMODE_MIN_DATANODE_DEFAULT = 1;
 
+  public static final String HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND =
+  "hdds.container.scanner.volume.bytes.per.second";
+  public static final long
+  HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT = 1048576L;
 
 
 Review comment:
   1048576L , ozone supports writing things like 1MB. if that is useful.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 283002)
Time Spent: 1h 20m  (was: 1h 10m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=282999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282999
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:28
Start Date: 25/Jul/19 23:28
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307537762
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
 ##
 @@ -220,43 +229,66 @@ private void checkBlockDB() throws IOException {
   throw new IOException(dbFileErrorMsg);
 }
 
-
 onDiskContainerData.setDbFile(dbFile);
 try(ReferenceCountedDB db =
-BlockUtils.getDB(onDiskContainerData, checkConfig)) {
-  iterateBlockDB(db);
-}
-  }
+BlockUtils.getDB(onDiskContainerData, checkConfig);
+KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
+new File(onDiskContainerData.getContainerPath( {
 
-  private void iterateBlockDB(ReferenceCountedDB db)
-  throws IOException {
-Preconditions.checkState(db != null);
-
-// get "normal" keys from the Block DB
-try(KeyValueBlockIterator kvIter = new KeyValueBlockIterator(containerID,
-new File(onDiskContainerData.getContainerPath( {
-
-  // ensure there is a chunk file for each key in the DB
-  while (kvIter.hasNext()) {
+  while(kvIter.hasNext()) {
 BlockData block = kvIter.nextBlock();
-
-List chunkInfoList = block.getChunks();
-for (ContainerProtos.ChunkInfo chunk : chunkInfoList) {
-  File chunkFile;
-  chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
+for(ContainerProtos.ChunkInfo chunk : block.getChunks()) {
+  File chunkFile = ChunkUtils.getChunkFile(onDiskContainerData,
   ChunkInfo.getFromProtoBuf(chunk));
-
   if (!chunkFile.exists()) {
 // concurrent mutation in Block DB? lookup the block again.
 byte[] bdata = db.getStore().get(
 Longs.toByteArray(block.getBlockID().getLocalID()));
-if (bdata == null) {
-  LOG.trace("concurrency with delete, ignoring deleted block");
-  break; // skip to next block from kvIter
-} else {
-  String errorStr = "Missing chunk file "
-  + chunkFile.getAbsolutePath();
-  throw new IOException(errorStr);
+if (bdata != null) {
+  throw new IOException("Missing chunk file "
+  + chunkFile.getAbsolutePath());
+}
+  } else if (chunk.getChecksumData().getType()
+  != ContainerProtos.ChecksumType.NONE){
 
 Review comment:
   Care to break this if part into a function ? I am ok if it is not possible 
or too much work, Do it only if it is easy for you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 282999)
Time Spent: 1h  (was: 50m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=282985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282985
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 23:12
Start Date: 25/Jul/19 23:12
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1154: 
[HDDS-1200] Add support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307535953
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java
 ##
 @@ -68,11 +68,16 @@
   public static final String HDDS_CONTAINERSCRUB_ENABLED =
   "hdds.containerscrub.enabled";
   public static final boolean HDDS_CONTAINERSCRUB_ENABLED_DEFAULT = false;
+
   public static final boolean HDDS_SCM_SAFEMODE_ENABLED_DEFAULT = true;
   public static final String HDDS_SCM_SAFEMODE_MIN_DATANODE =
   "hdds.scm.safemode.min.datanode";
   public static final int HDDS_SCM_SAFEMODE_MIN_DATANODE_DEFAULT = 1;
 
+  public static final String HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND =
+  "hdds.container.scanner.volume.bytes.per.second";
 
 Review comment:
   Sorry my standard Ozone Comment: Can we please use the configuration based 
API for these changes.
   
https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 282985)
Time Spent: 40m  (was: 0.5h)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=282897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282897
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 20:49
Start Date: 25/Jul/19 20:49
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307494588
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerMetadataScanner.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.hadoop.ozone.container.ozoneimpl;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.net.ntp.TimeStamp;
+import org.apache.hadoop.ozone.container.common.interfaces.Container;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * This class is responsible to perform metadata verification of the
+ * containers.
+ */
+public class ContainerMetadataScanner extends Thread {
+  public static final Logger LOG =
+  LoggerFactory.getLogger(ContainerMetadataScanner.class);
+
+  private final ContainerController controller;
+  /**
+   * True if the thread is stopping.
+   * Protected by this object's lock.
+   */
+  private boolean stopping = false;
+
+  public ContainerMetadataScanner(ContainerController controller) {
+this.controller = controller;
+setName("ContainerMetadataScanner");
+setDaemon(true);
+  }
+
+  @Override
+  public void run() {
+/**
+ * the outer daemon loop exits on down()
+ */
+LOG.info("Background ContainerMetadataScanner starting up");
+while (!stopping) {
+  scrub();
+  if (!stopping) {
+try {
+  Thread.sleep(30); /* 5 min between scans */
 
 Review comment:
   This should be configurable, no?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 282897)
Time Spent: 0.5h  (was: 20m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=282849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282849
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 25/Jul/19 19:02
Start Date: 25/Jul/19 19:02
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154#discussion_r307454335
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsConfigKeys.java
 ##
 @@ -68,11 +68,16 @@
   public static final String HDDS_CONTAINERSCRUB_ENABLED =
   "hdds.containerscrub.enabled";
   public static final boolean HDDS_CONTAINERSCRUB_ENABLED_DEFAULT = false;
+
   public static final boolean HDDS_SCM_SAFEMODE_ENABLED_DEFAULT = true;
   public static final String HDDS_SCM_SAFEMODE_MIN_DATANODE =
   "hdds.scm.safemode.min.datanode";
   public static final int HDDS_SCM_SAFEMODE_MIN_DATANODE_DEFAULT = 1;
 
+  public static final String HDDS_CONTAINER_SCANNER_VOLUME_BYTES_PER_SECOND =
+  "hdds.container.scanner.volume.bytes.per.second";
 
 Review comment:
   We could make this a bit more intuitive by adding let's say, 
throttle.bytes.per.second?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 282849)
Time Spent: 20m  (was: 10m)

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1200) Ozone Data Scrubbing : Checksum verification for chunks

2019-07-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1200?focusedWorklogId=282274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-282274
 ]

ASF GitHub Bot logged work on HDDS-1200:


Author: ASF GitHub Bot
Created on: 24/Jul/19 20:55
Start Date: 24/Jul/19 20:55
Worklog Time Spent: 10m 
  Work Description: hgadre commented on pull request #1154: [HDDS-1200] Add 
support for checksum verification in data scrubber
URL: https://github.com/apache/hadoop/pull/1154
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 282274)
Time Spent: 10m
Remaining Estimate: 0h

> Ozone Data Scrubbing : Checksum verification for chunks
> ---
>
> Key: HDDS-1200
> URL: https://issues.apache.org/jira/browse/HDDS-1200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Background scrubber should read each chunk and verify the checksum.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org