[jira] [Created] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
Jim Brennan created YARN-9527: - Summary: Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file Key: YARN-9527 URL: https://issues.apache.org/jira/browse/YARN-9527 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.1.2, 2.8.5 Reporter: Jim Brennan A rogue ContainerLocalizer can get stuck in a loop continuously downloading the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" exception on each iteration. Sometimes this continues long enough that it fills up a disk or depletes available inodes for the filesystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9526) NM invariably dies if log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ggg resolved YARN-9526. --- Resolution: Not A Problem > NM invariably dies if log aggregation is enabled > > > Key: YARN-9526 > URL: https://issues.apache.org/jira/browse/YARN-9526 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 > Environment: Binary 3.2.0 hadoop release >Reporter: ggg >Priority: Major > Attachments: nm.log, yarn-site.xml > > > NM dies as soon as first task is scheduled if log aggregation is enabled. Log > attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9526) NM invariably dies if log aggregation is enabled
ggg created YARN-9526: - Summary: NM invariably dies if log aggregation is enabled Key: YARN-9526 URL: https://issues.apache.org/jira/browse/YARN-9526 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 3.2.0 Environment: Binary 3.2.0 hadoop release Reporter: ggg Attachments: nm.log NM dies as soon as first task is scheduled if log aggregation is enabled. Log attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9525) TFile format is not working against s3a remote folder
Adam Antal created YARN-9525: Summary: TFile format is not working against s3a remote folder Key: YARN-9525 URL: https://issues.apache.org/jira/browse/YARN-9525 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 3.1.2 Reporter: Adam Antal Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} configured to an s3a URI throws the following exception during log aggregation: {noformat} Cannot create writer for app application_1556199768861_0001. Skip log upload this time. java.io.IOException: java.io.FileNotFoundException: No such file or directory: s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory: s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) ... 7 more {noformat} This stack trace point to {{LogAggregationIndexedFileController$initializeWriter}} where we do the following steps (in a non-rolling log aggregation setup): - create FSDataOutputStream - writing out a UUID - flushing - immediately after that we call a GetFileStatus to get the length of the log file (the bytes we just wrote out), and that's where the failures happens: the file is not there yet due to eventual consistency. Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/ [May 1, 2019 11:48:44 PM] (weichiu) HDFS-14463. Add Log Level link under NameNode and DataNode Web UI -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-documentstore Unread field:TimelineEventSubDoc.java:[line 56] Unread field:TimelineMetricSubDoc.java:[line 44] FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core Class org.apache.hadoop.applications.mawo.server.common.TaskStatus implements Cloneable but does not define or use clone method At TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 39-346] Equals method for org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument is of type WorkerId At WorkerId.java:the argument is of type WorkerId At WorkerId.java:[line 114] org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does not check for null argument At WorkerId.java:null argument At WorkerId.java:[lines 114-115] Failed junit tests : hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 hadoop.yarn.client.cli.TestLogsCLI hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.mapreduce.v2.app.TestRuntimeEstimators hadoop.hdds.scm.pipeline.TestNodeFailure hadoop.ozone.om.TestOzoneManagerConfiguration hadoop.ozone.scm.TestSCMNodeManagerMXBean hadoop.ozone.web.TestOzoneRestWithMiniCluster hadoop.ozone.scm.node.TestSCMNodeMetrics hadoop.ozone.web.client.TestBuckets hadoop.ozone.om.TestOmMetrics hadoop.hdds.scm.pipeline.TestPipelineClose hadoop.hdds.scm.pipeline.TestSCMRestart hadoop.ozone.ozShell.TestOzoneDatanodeShell hadoop.ozone.om.TestOmAcls hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException hadoop.ozone.client.rpc.TestSecureOzoneRpcClient hadoop.ozone.om.TestOmBlockVersioning hadoop.ozone.scm.TestGetCommittedBlockLengthAndPutKey hadoop.ozone.scm.pipeline.TestSCMPipelineMetrics hadoop.ozone.om.TestScmSafeMode hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules hadoop.ozone.web.client.TestVolume hadoop.ozone.client.rpc.TestReadRetries hadoop.fs.ozone.contract.ITestOzoneContractGetFileStatus hadoop.fs.ozone.contract.ITestOzoneContractDelete hadoop.fs.ozone.contract.ITestOzoneContractSeek hadoop.fs.ozone.contract.ITestOzoneContractMkdir hadoop.fs.ozone.contract.ITestOzoneContractRootDir hadoop.fs.ozone.contract.ITestOzoneContractRename hadoop.fs.ozone.contract.ITestOzoneContractDistCp hadoop.fs.ozone.contract.ITestOzoneContractOpen hadoop.ozone.fsck.TestContainerMapper hadoop.ozone.freon.TestFreonWithDatanodeFastRestart hadoop.ozone.freon.TestFreonWithPipelineDestroy hadoop.ozone.freon.TestDataValidateWithSafeByteOperations hadoop.ozone.freon.TestRandomKeyGenerator hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-compile-javac-root.txt [332K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-checkstyle-root.txt [17M] hadolint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-pylint.txt [84K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-shelldocs.txt [44K] whitespace:
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-common-project/hadoop-common Class org.apache.hadoop.fs.GlobalStorageStatistics defines non-transient non-serializable instance field map In GlobalStorageStatistics.java:instance field map In GlobalStorageStatistics.java FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.ipc.TestIPCServerResponder hadoop.ipc.TestCallQueueManager hadoop.ha.TestActiveStandbyElectorRealZK hadoop.ipc.TestProtoBufRpc hadoop.fs.permission.TestStickyBit hadoop.fs.viewfs.TestViewFileSystemHdfs hadoop.fs.TestHDFSFileContextMainOperations hadoop.fs.viewfs.TestViewFsWithAcls hadoop.fs.contract.hdfs.TestHDFSContractCreate hadoop.fs.TestSWebHdfsFileContextMainOperations hadoop.fs.contract.hdfs.TestHDFSContractGetFileStatus hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot hadoop.fs.viewfs.TestViewFsWithXAttrs hadoop.fs.contract.hdfs.TestHDFSContractConcat hadoop.fs.TestUnbuffer hadoop.fs.TestEnhancedByteBufferAccess hadoop.fs.contract.hdfs.TestHDFSContractMkdir hadoop.fs.contract.hdfs.TestHDFSContractOpen hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector hadoop.yarn.server.resourcemanager.TestResourceTrackerService hadoop.yarn.server.resourcemanager.TestLeaderElectorService hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 hadoop.yarn.client.TestApplicationClientProtocolOnHA hadoop.yarn.client.api.impl.TestYarnClientWithReservation cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [328K] cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-cc-root-jdk1.8.0_191.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-javac-root-jdk1.8.0_191.txt [308K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-shellcheck.txt [72K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/whitespace-tabs.txt [1.2M] xml: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/xml.txt [12K] findbugs: