[jira] [Created] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible
Erik Krogen created HDFS-14979: -- Summary: [Observer Node] Balancer should submit getBlocks to Observer Node when possible Key: HDFS-14979 URL: https://issues.apache.org/jira/browse/HDFS-14979 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover, hdfs Reporter: Erik Krogen Assignee: Erik Krogen In HDFS-14162, we made it so that the Balancer could function when {{ObserverReadProxyProvider}} was in use. However, the Balancer would still read from the active NameNode, because {{getBlocks}} wasn't annotated as {{@ReadOnly}}. This task is to enable the Balancer to actually read from the Observer Node to alleviate load from the active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14978) In-place Erasure Coding Conversion
Wei-Chiu Chuang created HDFS-14978: -- Summary: In-place Erasure Coding Conversion Key: HDFS-14978 URL: https://issues.apache.org/jira/browse/HDFS-14978 Project: Hadoop HDFS Issue Type: New Feature Components: erasure-coding Affects Versions: 3.0.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses encoding algorithms to reduce disk space usage while retaining redundancy necessary for data recovery. It was a huge amount of work but it is just getting adopted after almost 2 years. One usability problem that’s blocking users from adopting HDFS Erasure Coding is that existing replicated files have to be copied to an EC-enabled directory explicitly. Renaming a file/directory to an EC-enabled directory does not automatically convert the blocks. Therefore users typically perform the following steps to erasure-code existing files: {noformat} Create $tmp directory, set EC policy at it Distcp $src to $tmp Delete $src (rm -rf $src) mv $tmp $src {noformat} There are several reasons why this is not popular: * Complex. The process involves several steps: distcp data to a temporary destination; delete source file; move destination to the source path. * Availability: there is a short period where nothing exists at the source path, and jobs may fail unexpectedly. * Overhead. During the copy phase, there is a point in time where all of source and destination files exist at the same time, exhausting disk space. * Not snapshot-friendly. If a snapshot is taken prior to performing the conversion, the source (replicated) files will be preserved in the cluster too. Therefore, the conversion actually increase storage space usage. * Not management-friendly. This approach changes file inode number, modification time and access time. Erasure coded files are supposed to store cold data, but this conversion makes data “hot” again. * Bulky. It’s either all or nothing. The directory may be partially erasure coded, but this approach simply erasure code everything again. To ease data management, we should offer a utility tool to convert replicated files to erasure coded files in-place. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2325) BenchMarkDatanodeDispatcher genesis test is failing with NPE
[ https://issues.apache.org/jira/browse/HDDS-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved HDDS-2325. --- Fix Version/s: 0.5.0 Resolution: Fixed > BenchMarkDatanodeDispatcher genesis test is failing with NPE > > > Key: HDDS-2325 > URL: https://issues.apache.org/jira/browse/HDDS-2325 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > ## What changes were proposed in this pull request? > Genesis is a microbenchmark tool for Ozone based on JMH > ([https://openjdk.java.net/projects/code-tools/jmh/).] > > Due to the recent Datanode changes the BenchMarkDatanodeDispatcher is failing > with NPE: > > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.interfaces.Handler.(Handler.java:69) > at > org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.(KeyValueHandler.java:114) > at > org.apache.hadoop.ozone.container.common.interfaces.Handler.getHandlerForContainerType(Handler.java:78) > at > org.apache.hadoop.ozone.genesis.BenchMarkDatanodeDispatcher.initialize(BenchMarkDatanodeDispatcher.java:115) > at > org.apache.hadoop.ozone.genesis.generated.BenchMarkDatanodeDispatcher_createContainer_jmhTest._jmh_tryInit_f_benchmarkdatanodedispatcher0_G(BenchMarkDatanodeDispatcher_createContainer_jmhTest.java:438) > at > org.apache.hadoop.ozone.genesis.generated.BenchMarkDatanodeDispatcher_createContainer_jmhTest.createContainer_Throughput(BenchMarkDatanodeDispatcher_createContainer_jmhTest.java:71) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453) > at > org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > And this is the just the biggest problem there are a few other problems. I > propose the following fixes: > *fix 1*: NPE is thrown because the 'context' object is required by > KeyValueHandler/Handler classes. > In fact the context is not required, we need two functionalities/info from > the context: the ability to send icr (IncrementalContainerReport) and the ID > of the datanode. > Law of Demeter principle suggests to have only the minimum required > information from other classes. > For example instead of having context but using only > context.getParent().getDatanodeDetails().getUuidString() we can have only the > UUID string which makes more easy to test (unit and benchmark) the > Handler/KeyValueHandler. > This is the biggest (but still small change) in this patch: I started to use > the datanodeId and an icrSender instead of having the full context. > *fix 2,3:* There were a few other problems. The scmId was missing if the > writeChunk was called from Benchmark and and the Checksum was also missing. > *fix 4:* I also had a few other problems: very huge containers are used > (default 5G) and as the benchmark starts with creating 100 containers it > requires 500G space by default. I adjusted the container size to make it > possible to run on local machine. > > ## How this patch can be tested? > {code:java} > ./ozone genesis -benchmark=BenchMarkDatanodeDispatcher.writeChunk{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2459) Refactor ReplicationManager to consider maintenance states
Stephen O'Donnell created HDDS-2459: --- Summary: Refactor ReplicationManager to consider maintenance states Key: HDDS-2459 URL: https://issues.apache.org/jira/browse/HDDS-2459 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Affects Versions: 0.5.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell In its current form the replication manager does not consider decommission or maintenance states when checking if replicas are sufficiently replicated. With the introduction of maintenance states, it needs to consider decommission and maintenance states when deciding if blocks are over or under replicated. It also needs to provide an API to allow the decommission manager to check if blocks are over or under replicated, so the decommission manager can decide if a node has completed decommission and maintenance or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14977) Quota Usage and Content summary are not same in Truncate with Snapshot
hemanthboyina created HDFS-14977: Summary: Quota Usage and Content summary are not same in Truncate with Snapshot Key: HDFS-14977 URL: https://issues.apache.org/jira/browse/HDFS-14977 Project: Hadoop HDFS Issue Type: Bug Reporter: hemanthboyina Assignee: hemanthboyina -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.util.TestReadWriteDiskValidator hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.datanode.TestBpServiceActorScheduler hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.registry.secure.TestSecureLogins hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 hadoop.yarn.sls.TestSLSRunner cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [328K] cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-compile-cc-root-jdk1.8.0_222.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-compile-javac-root-jdk1.8.0_222.txt [308K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-patch-shellcheck.txt [72K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/whitespace-tabs.txt [1.3M] xml: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_222.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [160K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [232K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/502/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K]
[jira] [Created] (HDFS-14976) HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options
Souryakanta Dwivedy created HDFS-14976: -- Summary: HDFS:fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options Key: HDFS-14976 URL: https://issues.apache.org/jira/browse/HDFS-14976 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.1.2 Reporter: Souryakanta Dwivedy Attachments: fsck_log.PNG, image-2019-11-11-15-40-22-128.png fsck option "-list-corruptfileblocks" suppress all other output while being used with different combination of fsck options. Steps:- Use hdfs fsck command with different combinations of options as hdfs fsck / -files -blocks -locations -storagepolicies hdfs fsck / -files -blocks -openforwrite hdfs fsck / -files -blocks -showprogress hdfs fsck / -files -openforwrite for all the combinations of options output will display. Use same fsck options with "-list-corruptfileblocks" ,it will suppress the output of all other options and only display the list of corrupt files which is not correct behavior Either it should display output of all the other option with corrupted file info or it has to be specifed in help info that this option should use alone without any combination of other options.Try these different combinations of options hdfs fsck / -files -blocks -list-corruptfileblocks hdfs fsck / -list-corruptfileblocks -files -blocks !image-2019-11-11-15-40-22-128.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org