2.8 Release activities
Hi, Just curious -- what is the current status of the 2.8 release? It looks like the release process for some time. There are 5 or 6 blocker / critical bugs of the upcoming 2.8 release: https://issues.apache.org/jira/browse/YARN-6654?jql=project%20in%20(HDFS%2C%20HADOOP%2C%20MAPREDUCE%2C%20YARN)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.8.2%2C%202.8.3) I think we can address them in reasonable amount of effort. We are interested in putting 2.8.x in production and it would be great to have a maintenance Apache release for the 2.8 line. I wonder, are there any concerns of not getting the release out? We might be able to get some helps internally to fix the issues in the 2.8 lines. I can also volunteer to be the release manager for 2.8.2 if it requires more effort to coordinate to push the release out. Regards, Haohui
[jira] [Created] (HDFS-12131) Add some of the FSNamesystem JMX values as metrics
Erik Krogen created HDFS-12131: -- Summary: Add some of the FSNamesystem JMX values as metrics Key: HDFS-12131 URL: https://issues.apache.org/jira/browse/HDFS-12131 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namenode Reporter: Erik Krogen Assignee: Erik Krogen Priority: Minor A number of useful numbers are emitted via the FSNamesystem JMX, but not through the metrics system. These would be useful to be able to track over time, e.g. to alert on via standard metrics systems or to view trends and rate changes: * NumLiveDataNodes * NumDeadDataNodes * NumDecomLiveDataNodes * NumDecomDeadDataNodes * NumDecommissioningDataNodes * NumStaleStorages This is a simple change that just requires annotating the JMX methods with {{@Metric}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12130) Optimizing permission check for getContentSummary
Chen Liang created HDFS-12130: - Summary: Optimizing permission check for getContentSummary Key: HDFS-12130 URL: https://issues.apache.org/jira/browse/HDFS-12130 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Chen Liang Assignee: Chen Liang Currently, {{getContentSummary}} takes two phases to complete: - phase1. check the permission of the entire subtree. If any subdirectory does not have {{READ_EXECUTE}}, an access control exception is thrown and {{getContentSummary}} terminates here (unless it's super user). - phase2. If phase1 passed, it will then traverse the entire tree recursively to get the actual content summary. An issue is, both phases currently hold the fs lock. Phase 2 has already been written that, it will yield the fs lock over time, such that it does not block other operations for too long. However phase 1 does not yield. Meaning it's possible that the permission check phase still blocks things for long time. One fix is to add lock yield to phase 1. But a simpler fix is to merge phase 1 into phase 2. Namely, instead of doing a full traversal for permission check first, we start with phase 2 directly, but for each directory, before obtaining its summary, check its permission first. This way we take advantage of existing lock yield in phase 2 code and still able to check permission and terminate on access exception. Thanks [~szetszwo] for the offline discussions! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/ [Jul 11, 2017 6:19:08 PM] (jzhuge) HDFS-12052. Set SWEBHDFS delegation token kind when ssl is enabled in [Jul 11, 2017 8:34:27 PM] (stevel) HADOOP-14535 wasb: implement high-performance random access and seek of [Jul 12, 2017 2:35:50 AM] (aajisaka) HADOOP-14629. Improve exception checking in FileContext related JUnit [Jul 12, 2017 4:06:41 AM] (jzhuge) HDFS-12114. Consistent HttpFS property names. Contributed by John Zhuge. [Jul 12, 2017 9:37:39 AM] (stevel) HADOOP-14581. Restrict setOwner to list of user when security is enabled [Jul 12, 2017 10:38:32 AM] (aajisaka) YARN-6809. Fix typo in ResourceManagerHA.md. Contributed by Yeliang -1 overall The following subsystems voted -1: compile mvninstall unit The following subsystems voted -1 but were configured to be filtered/ignored: cc javac The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.ha.TestZKFailoverControllerStress hadoop.test.TestLambdaTestUtils hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 hadoop.hdfs.server.namenode.ha.TestBootstrapStandby hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.yarn.server.nodemanager.recovery.TestNMLeveldbStateStoreService hadoop.yarn.server.nodemanager.TestNodeManagerShutdown hadoop.yarn.server.timeline.TestRollingLevelDB hadoop.yarn.server.timeline.TestTimelineDataManager hadoop.yarn.server.timeline.TestLeveldbTimelineStore hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore hadoop.yarn.server.resourcemanager.TestRMRestart hadoop.yarn.server.TestMiniYarnClusterNodeUtilization hadoop.yarn.server.TestContainerManagerSecurity hadoop.yarn.client.api.impl.TestAMRMClient hadoop.yarn.client.api.impl.TestNMClient hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.mapred.TestShuffleHandler hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService hadoop.mapred.TestMRTimelineEventHandling Timed out junit tests : org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels mvninstall: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-mvninstall-root.txt [620K] compile: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-compile-root.txt [20K] cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-compile-root.txt [20K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-compile-root.txt [20K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-assemblies.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [152K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [628K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [56K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/p
[jira] [Created] (HDFS-12129) Ozone
Weiwei Yang created HDFS-12129: -- Summary: Ozone Key: HDFS-12129 URL: https://issues.apache.org/jira/browse/HDFS-12129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Weiwei Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/462/ [Jul 11, 2017 9:22:44 AM] (sunilg) YARN-6714. IllegalStateException while handling APP_ATTEMPT_REMOVED [Jul 11, 2017 12:40:11 PM] (yqlin) HDFS-12085. Reconfigure namenode heartbeat interval fails if the [Jul 11, 2017 6:19:08 PM] (jzhuge) HDFS-12052. Set SWEBHDFS delegation token kind when ssl is enabled in [Jul 11, 2017 8:34:27 PM] (stevel) HADOOP-14535 wasb: implement high-performance random access and seek of [Jul 12, 2017 2:35:50 AM] (aajisaka) HADOOP-14629. Improve exception checking in FileContext related JUnit [Jul 12, 2017 4:06:41 AM] (jzhuge) HDFS-12114. Consistent HttpFS property names. Contributed by John Zhuge. -1 overall The following subsystems voted -1: findbugs unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-hdfs-project/hadoop-hdfs-client Possible exposure of partially initialized object in org.apache.hadoop.hdfs.DFSClient.initThreadsNumForStripedReads(int) At DFSClient.java:object in org.apache.hadoop.hdfs.DFSClient.initThreadsNumForStripedReads(int) At DFSClient.java:[line 2888] org.apache.hadoop.hdfs.server.protocol.SlowDiskReports.equals(Object) makes inefficient use of keySet iterator instead of entrySet iterator At SlowDiskReports.java:keySet iterator instead of entrySet iterator At SlowDiskReports.java:[line 105] FindBugs : module:hadoop-hdfs-project/hadoop-hdfs Possible null pointer dereference in org.apache.hadoop.hdfs.qjournal.server.JournalNode.getJournalsStatus() due to return value of called method Dereferenced at JournalNode.java:org.apache.hadoop.hdfs.qjournal.server.JournalNode.getJournalsStatus() due to return value of called method Dereferenced at JournalNode.java:[line 302] org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setClusterId(String) unconditionally sets the field clusterId At HdfsServerConstants.java:clusterId At HdfsServerConstants.java:[line 193] org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setForce(int) unconditionally sets the field force At HdfsServerConstants.java:force At HdfsServerConstants.java:[line 217] org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setForceFormat(boolean) unconditionally sets the field isForceFormat At HdfsServerConstants.java:isForceFormat At HdfsServerConstants.java:[line 229] org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setInteractiveFormat(boolean) unconditionally sets the field isInteractiveFormat At HdfsServerConstants.java:isInteractiveFormat At HdfsServerConstants.java:[line 237] Possible null pointer dereference in org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocksHelper(File, File, int, HardLink, boolean, File, List) due to return value of called method Dereferenced at DataStorage.java:org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocksHelper(File, File, int, HardLink, boolean, File, List) due to return value of called method Dereferenced at DataStorage.java:[line 1339] Possible null pointer dereference in org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldLegacyOIVImages(String, long) due to return value of called method Dereferenced at NNStorageRetentionManager.java:org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldLegacyOIVImages(String, long) due to return value of called method Dereferenced at NNStorageRetentionManager.java:[line 258] Possible null pointer dereference in org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil$1.visitFile(Path, BasicFileAttributes) due to return value of called method Dereferenced at NNUpgradeUtil.java:org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil$1.visitFile(Path, BasicFileAttributes) due to return value of called method Dereferenced at NNUpgradeUtil.java:[line 133] Useless condition:argv.length >= 1 at this point At DFSAdmin.java:[line 2085] Useless condition:numBlocks == -1 at this point At ImageLoaderCurrent.java:[line 727] FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Useless object stored in variable removedNullContainers of method org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List) At NodeStatusUpdaterImpl.java:removedNullContainers of method org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List) At NodeStatusUpdaterImpl.java:[line 642] org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdate
[jira] [Created] (HDFS-12128) Namenode failover may make balancer's efforts be in vain
liuyiyang created HDFS-12128: Summary: Namenode failover may make balancer's efforts be in vain Key: HDFS-12128 URL: https://issues.apache.org/jira/browse/HDFS-12128 Project: Hadoop HDFS Issue Type: Bug Components: balancer & mover Affects Versions: 2.6.0 Reporter: liuyiyang The problem can be reproduced as follows: 1.In an HA cluster with imbalance datanode usage, we run "start-balancer.sh" to make the cluster balanced; 2.Before starting balancer, trigger failover of namenodes, this will make all datanodes be marked as stale by active namenode; 3.Start balancer to make the datanode usage balanced; 4.As balancer is running, under-utilized datanodes' usage will increase, but over-utilized datanodes' usage will stay unchanged for long time. Since all datanodes are marked as stale, deletion will be postponed in stale datanodes. During balancing, the replicas in source datanodes can't be deleted immediately, so the total usage of the cluster will increase and won't decrease until datanodes' stale state be cancelled. When the datanodes send next block report to namenode(default interval is 6h), active namenode will cancel the stale state of datanodes. I found if replicas on source datanodes can't be deleted immediately in OP_REPLACE operation via del_hint to namenode, namenode will schedule replicas on datanodes with least remaining space to delete instead of replicas on source datanodes. Unfortunately, datanodes with least remaining space may be the target datanodes when balancing, which will lead to imbalanced datanode usage again. If balancer finishes before next block report, all postponed over-replicated replicas will be deleted based on remaining space of datanodes, this may lead to furitless balancer efforts. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands
Yiqun Lin created HDFS-12127: Summary: Ozone: Ozone shell: Add more testing for key shell commands Key: HDFS-12127 URL: https://issues.apache.org/jira/browse/HDFS-12127 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, tools Affects Versions: HDFS-7240 Reporter: Yiqun Lin Assignee: Yiqun Lin Adding more unit tests for ozone key commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands
Yiqun Lin created HDFS-12126: Summary: Ozone: Ozone shell: Add more testing for bucket shell commands Key: HDFS-12126 URL: https://issues.apache.org/jira/browse/HDFS-12126 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, tools Affects Versions: HDFS-7240 Reporter: Yiqun Lin Assignee: Yiqun Lin Adding more unit testS for bucket commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org