Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
Thanks Eric! --Yongjun On Mon, Jun 11, 2018 at 8:05 AM, Eric Payne wrote: > Sorry, Yongjun. My +1 is also binding > +1 (binding) > -Eric Payne > > On Friday, June 1, 2018, 12:25:36 PM CDT, Eric Payne < > eric.payne1...@yahoo.com> wrote: > > > > > Thanks a lot, Yongjun, for your hard work on this release. > > +1 > - Built from source > - Installed on 6 node pseudo cluster > > > Tested the following in the Capacity Scheduler: > - Verified that running apps in labelled queues restricts tasks to the > labelled nodes. > - Verified that various queue config properties for CS are refreshable > - Verified streaming jobs work as expected > - Verified that user weights work as expected > - Verified that FairOrderingPolicy in a CS queue will evenly assign > resources > - Verified running yarn shell application runs as expected > > > > > > > > On Friday, June 1, 2018, 12:48:26 AM CDT, Yongjun Zhang < > yjzhan...@apache.org> wrote: > > > > > > Greetings all, > > I've created the first release candidate (RC0) for Apache Hadoop > 3.0.3. This is our next maintenance release to follow up 3.0.2. It includes > about 249 > important fixes and improvements, among which there are 8 blockers. See > https://issues.apache.org/jira/issues/?filter=12343997 > > The RC artifacts are available at: > https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/ > > The maven artifacts are available via > https://repository.apache.org/content/repositories/orgapachehadoop-1126 > > Please try the release and vote; the vote will run for the usual 5 working > days, ending on 06/07/2018 PST time. Would really appreciate your > participation here. > > I bumped into quite some issues along the way, many thanks to quite a few > people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu. > > Thanks, > > --Yongjun >
[jira] [Created] (HDDS-162) DataNode Container reads/Writes should be disallowed for open containers if the replication type mismatches
Shashikant Banerjee created HDDS-162: Summary: DataNode Container reads/Writes should be disallowed for open containers if the replication type mismatches Key: HDDS-162 URL: https://issues.apache.org/jira/browse/HDDS-162 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode, SCM Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.2.1 In Ozone, container can be created via ratis or Standalone protocol. However, the reads/.writes on the containers on datanodes can be done through either of these if the container location is known. A case may arise where data is being written into container via Ratis i.e, the container is in open State on the Datanodes and read via Standalone. This should not be allowed as if the read from the follower Datanodes in Ratis via Standalone Protocol might result in giving stale data. Once the container is closed on the datanode, data can be read via either of the protocols. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/ [Jun 11, 2018 5:53:37 AM] (sammi.chen) HADOOP-15499. Performance severe drops when running [Jun 11, 2018 6:12:44 PM] (haibochen) YARN-8323. FairScheduler.allocConf should be declared as volatile. [Jun 11, 2018 6:16:21 PM] (haibochen) YARN-8322. Change log level when there is an IOException when the [Jun 11, 2018 6:19:33 PM] (haibochen) YARN-8321. AllocationFileLoaderService.getAllocationFile() should be [Jun 11, 2018 6:20:29 PM] (inigoiri) HDFS-13653. Make dfs.client.failover.random.order a per nameservice [Jun 11, 2018 10:13:18 PM] (yzhang) Update CHANGES, RELEASENOTES, and jdiff for 3.0.3 release. (cherry [Jun 11, 2018 11:02:32 PM] (xyao) HDDS-72. Add deleteTransactionId field in ContainerInfo. Contributed by -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadListener; locked 75% of time Unsynchronized access at AllocationFileLoaderService.java:75% of time Unsynchronized access at AllocationFileLoaderService.java:[line 117] Failed junit tests : hadoop.util.TestBasicDiskValidator hadoop.hdfs.client.impl.TestBlockReaderLocal hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageSchema hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage hadoop.mapred.TestMRTimelineEventHandling cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/diff-compile-javac-root.txt [336K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/diff-checkstyle-root.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/diff-patch-shelldocs.txt [16K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/whitespace-eol.txt [9.4M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/whitespace-tabs.txt [1.1M] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-hdds_client.txt [56K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt [52K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt [60K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-hdds_tools.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-ozone_client.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/809/artifact/out/branch-findbugs-hadoop-ozone_common.txt [24K]
[jira] [Created] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode
Wei-Chiu Chuang created HDFS-13672: -- Summary: clearCorruptLazyPersistFiles could crash NameNode Key: HDFS-13672 URL: https://issues.apache.org/jira/browse/HDFS-13672 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang I started a NameNode on a pretty large fsimage. Since the NameNode is started without any DataNodes, all blocks (100 million) are "corrupt". Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write lock for a long time: {noformat} 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held for 46024 ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689) org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532) org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543) java.lang.Thread.run(Thread.java:748) Number of suppressed write-lock reports: 0 Longest write-lock held interval: 46024 {noformat} Here's the relevant code: {code} writeLock(); try { final Iterator it = blockManager.getCorruptReplicaBlockIterator(); while (it.hasNext()) { Block b = it.next(); BlockInfo blockInfo = blockManager.getStoredBlock(b); if (blockInfo.getBlockCollection().getStoragePolicyID() == lpPolicy.getId()) { filesToDelete.add(blockInfo.getBlockCollection()); } } for (BlockCollection bc : filesToDelete) { LOG.warn("Removing lazyPersist file " + bc.getName() + " with no replicas."); changed |= deleteInternal(bc.getName(), false, false, false); } } finally { writeUnlock(); } {code} In essence, the iteration over corrupt replica list should be broken down into smaller iterations to avoid a single long wait. Since this operation holds NameNode write lock for more than 45 seconds, the default ZKFC connection timeout, it implies an extreme case like this (100 million corrupt blocks) could lead to NameNode failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
Yiqun Lin created HDFS-13671: Summary: Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet Key: HDFS-13671 URL: https://issues.apache.org/jira/browse/HDFS-13671 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.3, 3.1.0 Reporter: Yiqun Lin NameNode hung when deleting large files/blocks. The stack info: {code} "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) at org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) at org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) {code} In the current deletion logic in NameNode, there are mainly two steps: * Collect INodes and all blocks to be deleted, then delete INodes. * Remove blocks chunk by chunk in a loop. Actually the first step should be a more expensive operation and will takes more time. However, now we always see NN hangs during the remove block operation. Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a better performance in dealing FBR/IBRs. But compared with early implementation in remove-block logic, {{FoldedTreeSet}} seems more slower since It will take additional time to balance tree node. When there are large block to be removed/deleted, it looks bad. For the get type operations in {{DatanodeStorageInfo}}, we only provide the {{getBlockIterator}} to return blocks iterator and no other get operation with specified block. Still we need to use {{FoldedTreeSet}} here? As we know {{FoldedTreeSet}} is benefit for getting node not deleting/updating node. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13670) Decommisiong datanode never end
zhangzhuo created HDFS-13670: Summary: Decommisiong datanode never end Key: HDFS-13670 URL: https://issues.apache.org/jira/browse/HDFS-13670 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhangzhuo In my cluster,there has one datanode in decommissing which nerver end. On the web UI,i can see this datanode has one under replicatied blocks ,how can i force this datanode to decommissed status , or how can i do to make this block statisfied the repilication factor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-161) Add functionality to queue ContainerClose command from SCM Hearbeat Reposnse to Ratis
Shashikant Banerjee created HDDS-161: Summary: Add functionality to queue ContainerClose command from SCM Hearbeat Reposnse to Ratis Key: HDDS-161 URL: https://issues.apache.org/jira/browse/HDDS-161 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode, SCM Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.2.1 When a container needs to be closed at the Datanode, SCM will queue a close command which will be encoded as a part of Heartbeat Response to the Datanode. This command will be picked up from the response at the Datanode which will then be submitted to the XeiverServer to process the close command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64
For more details, see https://builds.apache.org/job/hadoop-trunk-win/495/ [Jun 11, 2018 5:53:37 AM] (sammi.chen) HADOOP-15499. Performance severe drops when running -1 overall The following subsystems voted -1: compile mvninstall pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc javac The following subsystems are considered long running: (runtime bigger than 1h 00m 00s) unit Specific tests: Failed junit tests : hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec hadoop.fs.contract.rawlocal.TestRawlocalContractAppend hadoop.fs.TestFileUtil hadoop.fs.TestFsShellCopy hadoop.fs.TestFsShellList hadoop.fs.TestLocalFileSystem hadoop.http.TestHttpServer hadoop.http.TestHttpServerLogs hadoop.io.compress.TestCodec hadoop.io.nativeio.TestNativeIO hadoop.ipc.TestIPC hadoop.ipc.TestSocketFactory hadoop.metrics2.impl.TestStatsDMetrics hadoop.security.TestSecurityUtil hadoop.security.TestShellBasedUnixGroupsMapping hadoop.security.token.TestDtUtilShell hadoop.util.TestDiskCheckerWithDiskIo hadoop.util.TestNativeCodeLoader hadoop.hdfs.client.impl.TestBlockReaderLocal hadoop.hdfs.qjournal.server.TestJournalNode hadoop.hdfs.qjournal.server.TestJournalNodeSync hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages hadoop.hdfs.server.datanode.fsdataset.impl.TestProvidedImpl hadoop.hdfs.server.datanode.TestBlockPoolSliceStorage hadoop.hdfs.server.datanode.TestBlockScanner hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics hadoop.hdfs.server.datanode.TestDataNodeFaultInjector hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC hadoop.hdfs.server.mover.TestMover hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics hadoop.hdfs.server.namenode.TestEditLogRace hadoop.hdfs.server.namenode.TestReencryption hadoop.hdfs.server.namenode.TestStartup hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality hadoop.hdfs.TestDatanodeStartupFixesLegacyStorageIDs hadoop.hdfs.TestDFSShell hadoop.hdfs.TestDFSStripedOutputStreamWithFailure hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy hadoop.hdfs.TestDFSUpgradeFromImage hadoop.hdfs.TestFetchImage hadoop.hdfs.TestHDFSFileSystemContract hadoop.hdfs.TestLeaseRecovery2 hadoop.hdfs.TestPersistBlocks hadoop.hdfs.TestPread hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData hadoop.hdfs.TestSecureEncryptionZoneWithKMS hadoop.hdfs.TestTrashWithSecureEncryptionZones hadoop.hdfs.tools.TestDFSAdmin hadoop.hdfs.tools.TestDFSAdminWithHA hadoop.hdfs.web.TestWebHDFS hadoop.hdfs.web.TestWebHdfsUrl hadoop.fs.http.server.TestHttpFSServerWebServer hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestAppLogAggregatorImpl hadoop.yarn.server.nodemanager.containermanager.TestAuxServices hadoop.yarn.server.nodemanager.containermanager.TestContainerManager hadoop.yarn.server.nodemanager.recovery.TestNMLeveldbStateStoreService hadoop.yarn.server.nodemanager.TestContainerExecutor hadoop.yarn.server.nodemanager.TestLocalDirsHandlerService hadoop.yarn.server.nodemanager.TestNodeManagerResync hadoop.yarn.server.webproxy.amfilter.TestAmFilter hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer hadoop.yarn.server.timeline.security.TestTimelineAuthenticationFilterForV1 hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestFSSchedulerConfigurationStore hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestLeveldbConfigurationStore hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementProcessor hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService hadoop.yarn.server.resourcemanager.TestResourceTrackerService