Apache Hadoop qbt Report: trunk+JDK11 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/636/ [Feb 27, 2024, 2:19:57 AM] (github) HDFS-17358. EC: infinite lease recovery caused by the length of RWR equals to zero or datanode does not have the replica. (#6509). Contributed by farmmamba. -1 overall The following subsystems voted -1: blanks hadolint mvnsite pathlen spotbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml spotbugs : module:hadoop-common-project/hadoop-common Possible null pointer dereference in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:[line 332] spotbugs : module:hadoop-common-project Possible null pointer dereference in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:[line 332] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-client Redundant nullcheck of sockStreamList, which is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:[line 158] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs Redundant nullcheck of oldLock, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory) Redundant null check at DataStorage.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory) Redundant null check at DataStorage.java:[line 695] Redundant nullcheck of metaChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long, FileInputStream, FileChannel, String) Redundant null check at MappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long, FileInputStream, FileChannel, String) Redundant null check at MappableBlockLoader.java:[line 138] Redundant nullcheck of blockChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at MemoryMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at MemoryMappableBlockLoader.java:[line 75] Redundant nullcheck of blockChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at NativePmemMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at NativePmemMappableBlockLoader.java:[line 85] Redundant nullcheck of metaChannel, which is known to be non-null in
[jira] [Created] (HDFS-17401) Erasure Coding: Excess internal block can't be deleted correctly
Ruinan Gu created HDFS-17401: Summary: Erasure Coding: Excess internal block can't be deleted correctly Key: HDFS-17401 URL: https://issues.apache.org/jira/browse/HDFS-17401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ruinan Gu Assignee: Ruinan Gu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1512/ No changes -1 overall The following subsystems voted -1: blanks hadolint pathlen spotbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml spotbugs : module:hadoop-common-project/hadoop-common Possible null pointer dereference in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:[line 332] spotbugs : module:hadoop-common-project Possible null pointer dereference in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Dereferenced at ValueQueue.java:[line 332] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-client Redundant nullcheck of sockStreamList, which is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:[line 158] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-httpfs Redundant nullcheck of xAttrs, which is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:[line 1373] spotbugs : module:hadoop-yarn-project/hadoop-yarn org.apache.hadoop.yarn.service.ServiceScheduler$1.load(ConfigFile) may return null, but is declared @Nonnull At ServiceScheduler.java:is declared @Nonnull At ServiceScheduler.java:[line 555] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-rbf Redundant nullcheck of dns, which is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:[line 1091] spotbugs : module:hadoop-hdfs-project Redundant nullcheck of xAttrs, which is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:[line 1373] Redundant nullcheck of sockStreamList, which is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:[line 158] Redundant nullcheck of dns, which is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:[line
[jira] [Created] (HDFS-17400) Expose metrics for inode ChildrenList size
Srinivasu Majeti created HDFS-17400: --- Summary: Expose metrics for inode ChildrenList size Key: HDFS-17400 URL: https://issues.apache.org/jira/browse/HDFS-17400 Project: Hadoop HDFS Issue Type: Improvement Components: dfs Affects Versions: 3.1.1 Reporter: Srinivasu Majeti The very common scenario where customer jobs failed when writing into the "x" directory because the file limit on "x" reached the configured value controlled by dfs.namenode.fs-limits.max-directory-items. Example: The directory item limit of /tmp is exceeded: limit=1048576 items=1048576 I think we need to expose new metrics into "NameNodeMetrics" and add paths that exceed 90% of dfs.namenode.fs-limits.max-directory-items. However, higher costs when recomputing the path size and removing them from metrics on every delete. So, Should we consider letting SNN handle this from updateCountForQuota? Anyways, updateCountForQuota often runs in SNN, so CM can query SNN and alert users when this path list is non-empty. FSDirectory#verifyMaxDirItems. {code:java} /** * Verify children size for fs limit. * * @throws MaxDirectoryItemsExceededException too many children. */ void verifyMaxDirItems(INodeDirectory parent, String parentPath) throws MaxDirectoryItemsExceededException { final int count = parent.getChildrenList(CURRENT_STATE_ID).size(); if (count >= maxDirItems) { final MaxDirectoryItemsExceededException e = new MaxDirectoryItemsExceededException(parentPath, maxDirItems, count); if (namesystem.isImageLoaded()) { throw e; } else { // Do not throw if edits log is still being processed NameNode.LOG.error("FSDirectory.verifyMaxDirItems: " + e.getLocalizedMessage()); } } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17399) Ensure atomic transactions when snapshot manager is facing OS resource limit issues
Srinivasu Majeti created HDFS-17399: --- Summary: Ensure atomic transactions when snapshot manager is facing OS resource limit issues Key: HDFS-17399 URL: https://issues.apache.org/jira/browse/HDFS-17399 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.1.1 Reporter: Srinivasu Majeti One of the customers is facing 'resource' issues ( max number of processes ) at least on one of the Namenodes. {code:java} host02: > As a result, Snapshot creation failed on 14th: 2023-05-14 10:41:28,233 WARN org.apache.hadoop.ipc.Server: IPC Server handler 22 on 8020, call Call#11 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.createSnapshot from xx.xxx.xx.xxx:59442 java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) at java.base/java.lang.Thread.start(Thread.java:803) at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937) at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:140) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodeWithLeases(LeaseManager.java:246) at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.addSnapshot(DirectorySnapshottableFeature.java:211) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addSnapshot(INodeDirectory.java:288) at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.createSnapshot(SnapshotManager.java:463) at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.createSnapshot(FSDirSnapshotOp.java:110) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSnapshot(FSNamesystem.java:6767) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createSnapshot(NameNodeRpcServer.java:1871) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.createSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1273) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNameno \{code} \{code:java} host02 log (NN log) 2023-05-14 10:42:49,983 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream 'http://host03.amd.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true, http://host02.domain.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true' to transaction ID 1623400203 2023-05-14 10:42:49,983 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream 'http://host01.domain.com:8480/getJournal?jid=cdp01ha=1623400203=-64%3A1444325792%3A1600117814333%3Acluster1546333019=true' to transaction ID 1623400203 2023-05-14 10:42:50,011 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation DeleteSnapshotOp [snapshotRoot=/user/user1, snapshotName=distcp-1546382661--205240459-new, RpcClientId=31353569-0e2e-4272-9acf-a6b71f51242c, RpcCallId=18] org.apache.hadoop.hdfs.protocol.SnapshotException: Cannot delete snapshot distcp-1546382661--205240459-new from path /user/user1: the snapshot does not exist. at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:260) at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:296) {code} Then we identified the wrong records in the edit log and fixed them manually {code:java} The edit causing the problem is "edits_01623400203-01623402627" and contains 38626 lines when converted to XML format. Further investigation, we discovered that there are 602 transactions attempting to delete a snapshot "distcp-1546382661--205240459-new" which does not exist. OP_DELETE_SNAPSHOT 1623401061 /user/user1 distcp-1546382661--205240459-new 31353569-0e2e-4272-9acf-a6b71f51242c 1864 Each transaction consists of above 10 lines, a total of 6020 lines that need to be removed from the original 38626 lines. The no of lines after correction is 38626-6020=32606 . {code} Raising the ticket to discuss how to address this corner issue instead of manually correcting edit logs, for example, there should be a defensive mechanism in Hadoop but missing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/ No changes -1 overall The following subsystems voted -1: asflicense hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.fs.TestFileUtil hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.TestFileLengthOnClusterRestart hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap hadoop.hdfs.server.namenode.ha.TestEditLogTailer hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.TestLeaseRecovery2 hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.mapreduce.lib.input.TestLineRecordReader hadoop.mapred.TestLineRecordReader hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.resourceestimator.service.TestResourceEstimatorService hadoop.yarn.sls.TestSLSRunner hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/diff-compile-javac-root.txt [488K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/diff-checkstyle-root.txt [14M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-mvnsite-root.txt [572K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/diff-patch-shellcheck.txt [72K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-javadoc-root.txt [36K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [220K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [456K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [36K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [16K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt [16K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1315/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt [28K]