2.8 Release activities

2017-07-12 Thread Haohui Mai
Hi,

Just curious -- what is the current status of the 2.8 release? It looks
like the release process for some time.

There are 5 or 6 blocker / critical bugs of the upcoming 2.8 release:

https://issues.apache.org/jira/browse/YARN-6654?jql=project%20in%20(HDFS%2C%20HADOOP%2C%20MAPREDUCE%2C%20YARN)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.8.2%2C%202.8.3)

I think we can address them in reasonable amount of effort.

We are interested in putting 2.8.x in production and it would be great to
have a maintenance Apache release for the 2.8 line.

I wonder, are there any concerns of not getting the release out? We might
be able to get some helps internally to fix the issues in the 2.8 lines. I
can also volunteer to be the release manager for 2.8.2 if it requires more
effort to coordinate to push the release out.

Regards,
Haohui


[jira] [Created] (HDFS-12131) Add some of the FSNamesystem JMX values as metrics

2017-07-12 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-12131:
--

 Summary: Add some of the FSNamesystem JMX values as metrics
 Key: HDFS-12131
 URL: https://issues.apache.org/jira/browse/HDFS-12131
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen
Priority: Minor


A number of useful numbers are emitted via the FSNamesystem JMX, but not 
through the metrics system. These would be useful to be able to track over 
time, e.g. to alert on via standard metrics systems or to view trends and rate 
changes:
* NumLiveDataNodes
* NumDeadDataNodes
* NumDecomLiveDataNodes
* NumDecomDeadDataNodes
* NumDecommissioningDataNodes
* NumStaleStorages

This is a simple change that just requires annotating the JMX methods with 
{{@Metric}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12130) Optimizing permission check for getContentSummary

2017-07-12 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12130:
-

 Summary: Optimizing permission check for getContentSummary
 Key: HDFS-12130
 URL: https://issues.apache.org/jira/browse/HDFS-12130
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang


Currently, {{getContentSummary}} takes two phases to complete:
- phase1. check the permission of the entire subtree. If any subdirectory does 
not have {{READ_EXECUTE}}, an access control exception is thrown and 
{{getContentSummary}} terminates here (unless it's super user).
- phase2. If phase1 passed, it will then traverse the entire tree recursively 
to get the actual content summary.

An issue is, both phases currently hold the fs lock.

Phase 2 has already been written that, it will yield the fs lock over time, 
such that it does not block other operations for too long. However phase 1 does 
not yield. Meaning it's possible that the permission check phase still blocks 
things for long time.

One fix is to add lock yield to phase 1. But a simpler fix is to merge phase 1 
into phase 2. Namely, instead of doing a full traversal for permission check 
first, we start with phase 2 directly, but for each directory, before obtaining 
its summary, check its permission first. This way we take advantage of existing 
lock yield in phase 2 code and still able to check permission and terminate on 
access exception.

Thanks [~szetszwo] for the offline discussions!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le

2017-07-12 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/

[Jul 11, 2017 6:19:08 PM] (jzhuge) HDFS-12052. Set SWEBHDFS delegation token 
kind when ssl is enabled in
[Jul 11, 2017 8:34:27 PM] (stevel) HADOOP-14535 wasb: implement 
high-performance random access and seek of
[Jul 12, 2017 2:35:50 AM] (aajisaka) HADOOP-14629. Improve exception checking 
in FileContext related JUnit
[Jul 12, 2017 4:06:41 AM] (jzhuge) HDFS-12114. Consistent HttpFS property 
names. Contributed by John Zhuge.
[Jul 12, 2017 9:37:39 AM] (stevel) HADOOP-14581. Restrict setOwner to list of 
user when security is enabled
[Jul 12, 2017 10:38:32 AM] (aajisaka) YARN-6809. Fix typo in 
ResourceManagerHA.md. Contributed by Yeliang




-1 overall


The following subsystems voted -1:
compile mvninstall unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc javac


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.ha.TestZKFailoverControllerStress 
   hadoop.test.TestLambdaTestUtils 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 
   hadoop.hdfs.server.namenode.ha.TestBootstrapStandby 
   hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA 
   hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer 
   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 
   hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.nodemanager.recovery.TestNMLeveldbStateStoreService 
   hadoop.yarn.server.nodemanager.TestNodeManagerShutdown 
   hadoop.yarn.server.timeline.TestRollingLevelDB 
   hadoop.yarn.server.timeline.TestTimelineDataManager 
   hadoop.yarn.server.timeline.TestLeveldbTimelineStore 
   hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore 
   hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore 
   
hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer 
   hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
   hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore 
   hadoop.yarn.server.resourcemanager.TestRMRestart 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.client.api.impl.TestAMRMClient 
   hadoop.yarn.client.api.impl.TestNMClient 
   hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore 
   hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient 
   hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapred.TestShuffleHandler 
   hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService 
   hadoop.mapred.TestMRTimelineEventHandling 

Timed out junit tests :

   org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands 
   
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore 
   
org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels 
  

   mvninstall:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-mvninstall-root.txt
  [620K]

   compile:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-compile-root.txt
  [20K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-compile-root.txt
  [20K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-compile-root.txt
  [20K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-assemblies.txt
  [4.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [152K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [628K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [56K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/373/artifact/out/p

[jira] [Created] (HDFS-12129) Ozone

2017-07-12 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12129:
--

 Summary: Ozone
 Key: HDFS-12129
 URL: https://issues.apache.org/jira/browse/HDFS-12129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-07-12 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/462/

[Jul 11, 2017 9:22:44 AM] (sunilg) YARN-6714. IllegalStateException while 
handling APP_ATTEMPT_REMOVED
[Jul 11, 2017 12:40:11 PM] (yqlin) HDFS-12085. Reconfigure namenode heartbeat 
interval fails if the
[Jul 11, 2017 6:19:08 PM] (jzhuge) HDFS-12052. Set SWEBHDFS delegation token 
kind when ssl is enabled in
[Jul 11, 2017 8:34:27 PM] (stevel) HADOOP-14535 wasb: implement 
high-performance random access and seek of
[Jul 12, 2017 2:35:50 AM] (aajisaka) HADOOP-14629. Improve exception checking 
in FileContext related JUnit
[Jul 12, 2017 4:06:41 AM] (jzhuge) HDFS-12114. Consistent HttpFS property 
names. Contributed by John Zhuge.




-1 overall


The following subsystems voted -1:
findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-hdfs-project/hadoop-hdfs-client 
   Possible exposure of partially initialized object in 
org.apache.hadoop.hdfs.DFSClient.initThreadsNumForStripedReads(int) At 
DFSClient.java:object in 
org.apache.hadoop.hdfs.DFSClient.initThreadsNumForStripedReads(int) At 
DFSClient.java:[line 2888] 
   org.apache.hadoop.hdfs.server.protocol.SlowDiskReports.equals(Object) 
makes inefficient use of keySet iterator instead of entrySet iterator At 
SlowDiskReports.java:keySet iterator instead of entrySet iterator At 
SlowDiskReports.java:[line 105] 

FindBugs :

   module:hadoop-hdfs-project/hadoop-hdfs 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.qjournal.server.JournalNode.getJournalsStatus() due to 
return value of called method Dereferenced at 
JournalNode.java:org.apache.hadoop.hdfs.qjournal.server.JournalNode.getJournalsStatus()
 due to return value of called method Dereferenced at JournalNode.java:[line 
302] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setClusterId(String)
 unconditionally sets the field clusterId At HdfsServerConstants.java:clusterId 
At HdfsServerConstants.java:[line 193] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setForce(int)
 unconditionally sets the field force At HdfsServerConstants.java:force At 
HdfsServerConstants.java:[line 217] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setForceFormat(boolean)
 unconditionally sets the field isForceFormat At 
HdfsServerConstants.java:isForceFormat At HdfsServerConstants.java:[line 229] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setInteractiveFormat(boolean)
 unconditionally sets the field isInteractiveFormat At 
HdfsServerConstants.java:isInteractiveFormat At HdfsServerConstants.java:[line 
237] 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocksHelper(File, File, 
int, HardLink, boolean, File, List) due to return value of called method 
Dereferenced at 
DataStorage.java:org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocksHelper(File,
 File, int, HardLink, boolean, File, List) due to return value of called method 
Dereferenced at DataStorage.java:[line 1339] 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldLegacyOIVImages(String,
 long) due to return value of called method Dereferenced at 
NNStorageRetentionManager.java:org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldLegacyOIVImages(String,
 long) due to return value of called method Dereferenced at 
NNStorageRetentionManager.java:[line 258] 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil$1.visitFile(Path, 
BasicFileAttributes) due to return value of called method Dereferenced at 
NNUpgradeUtil.java:org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil$1.visitFile(Path,
 BasicFileAttributes) due to return value of called method Dereferenced at 
NNUpgradeUtil.java:[line 133] 
   Useless condition:argv.length >= 1 at this point At DFSAdmin.java:[line 
2085] 
   Useless condition:numBlocks == -1 at this point At 
ImageLoaderCurrent.java:[line 727] 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
   Useless object stored in variable removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:[line 642] 
   
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdate

[jira] [Created] (HDFS-12128) Namenode failover may make balancer's efforts be in vain

2017-07-12 Thread liuyiyang (JIRA)
liuyiyang created HDFS-12128:


 Summary: Namenode failover may make balancer's efforts be in vain
 Key: HDFS-12128
 URL: https://issues.apache.org/jira/browse/HDFS-12128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Affects Versions: 2.6.0
Reporter: liuyiyang


The problem can be reproduced as follows:
1.In an HA cluster with imbalance datanode usage, we run "start-balancer.sh" to 
make the cluster balanced;
2.Before starting balancer, trigger failover of namenodes, this will make all 
datanodes be marked as stale by active namenode;
3.Start balancer to make the datanode usage balanced;
4.As balancer is running, under-utilized datanodes' usage will increase, but 
over-utilized datanodes' usage will stay unchanged for long time.

Since all datanodes are marked as stale, deletion will be postponed in stale 
datanodes. During balancing, the replicas in source datanodes can't be deleted 
immediately,
so the total usage of the cluster will increase and won't decrease until 
datanodes' stale state be cancelled.
When the datanodes send next block report to namenode(default interval is 6h), 
active namenode will cancel the stale state of datanodes. I found if replicas 
on source datanodes can't be deleted immediately in OP_REPLACE operation via 
del_hint to namenode,
namenode will schedule replicas on datanodes with least remaining space to 
delete instead of replicas on source datanodes. Unfortunately, datanodes with 
least remaining space may be the target datanodes when balancing, which will 
lead to imbalanced datanode usage again.
If balancer finishes before next block report, all postponed over-replicated 
replicas will be deleted based on remaining space of datanodes, this may lead 
to furitless balancer efforts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands

2017-07-12 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-12127:


 Summary: Ozone: Ozone shell: Add more testing for key shell 
commands
 Key: HDFS-12127
 URL: https://issues.apache.org/jira/browse/HDFS-12127
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, tools
Affects Versions: HDFS-7240
Reporter: Yiqun Lin
Assignee: Yiqun Lin


Adding more unit tests for ozone key commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands

2017-07-12 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-12126:


 Summary: Ozone: Ozone shell: Add more testing for bucket shell 
commands
 Key: HDFS-12126
 URL: https://issues.apache.org/jira/browse/HDFS-12126
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, tools
Affects Versions: HDFS-7240
Reporter: Yiqun Lin
Assignee: Yiqun Lin


Adding more unit testS for bucket commands, similar to HDFS-12118.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org