[jira] [Created] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file

2019-05-02 Thread Jim Brennan (JIRA)
Jim Brennan created YARN-9527:
-

 Summary: Rogue LocalizerRunner/ContainerLocalizer repeatedly 
downloading same file
 Key: YARN-9527
 URL: https://issues.apache.org/jira/browse/YARN-9527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.1.2, 2.8.5
Reporter: Jim Brennan


A rogue ContainerLocalizer can get stuck in a loop continuously downloading the 
same file while generating an "Invalid event: LOCALIZED at LOCALIZED" exception 
on each iteration.  Sometimes this continues long enough that it fills up a 
disk or depletes available inodes for the filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ggg resolved YARN-9526.
---
Resolution: Not A Problem

> NM invariably dies if log aggregation is enabled
> 
>
> Key: YARN-9526
> URL: https://issues.apache.org/jira/browse/YARN-9526
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
> Environment: Binary 3.2.0 hadoop release
>Reporter: ggg
>Priority: Major
> Attachments: nm.log, yarn-site.xml
>
>
> NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
> attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9526) NM invariably dies if log aggregation is enabled

2019-05-02 Thread ggg (JIRA)
ggg created YARN-9526:
-

 Summary: NM invariably dies if log aggregation is enabled
 Key: YARN-9526
 URL: https://issues.apache.org/jira/browse/YARN-9526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 3.2.0
 Environment: Binary 3.2.0 hadoop release
Reporter: ggg
 Attachments: nm.log

NM dies as soon as first task is scheduled if log aggregation is enabled. Log 
attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9525) TFile format is not working against s3a remote folder

2019-05-02 Thread Adam Antal (JIRA)
Adam Antal created YARN-9525:


 Summary: TFile format is not working against s3a remote folder
 Key: YARN-9525
 URL: https://issues.apache.org/jira/browse/YARN-9525
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 3.1.2
Reporter: Adam Antal


Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} configured 
to an s3a URI throws the following exception during log aggregation:

{noformat}
Cannot create writer for app application_1556199768861_0001. Skip log upload 
this time. 
java.io.IOException: java.io.FileNotFoundException: No such file or directory: 
s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
at 
org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
... 7 more
{noformat}

This stack trace point to 
{{LogAggregationIndexedFileController$initializeWriter}} where we do the 
following steps (in a non-rolling log aggregation setup):
- create FSDataOutputStream
- writing out a UUID
- flushing
- immediately after that we call a GetFileStatus to get the length of the log 
file (the bytes we just wrote out), and that's where the failures happens: the 
file is not there yet due to eventual consistency.

Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2019-05-02 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/

[May 1, 2019 11:48:44 PM] (weichiu) HDFS-14463. Add Log Level link under 
NameNode and DataNode Web UI




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-documentstore
 
   Unread field:TimelineEventSubDoc.java:[line 56] 
   Unread field:TimelineMetricSubDoc.java:[line 44] 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

Failed junit tests :

   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
   hadoop.yarn.client.cli.TestLogsCLI 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapreduce.v2.app.TestRuntimeEstimators 
   hadoop.hdds.scm.pipeline.TestNodeFailure 
   hadoop.ozone.om.TestOzoneManagerConfiguration 
   hadoop.ozone.scm.TestSCMNodeManagerMXBean 
   hadoop.ozone.web.TestOzoneRestWithMiniCluster 
   hadoop.ozone.scm.node.TestSCMNodeMetrics 
   hadoop.ozone.web.client.TestBuckets 
   hadoop.ozone.om.TestOmMetrics 
   hadoop.hdds.scm.pipeline.TestPipelineClose 
   hadoop.hdds.scm.pipeline.TestSCMRestart 
   hadoop.ozone.ozShell.TestOzoneDatanodeShell 
   hadoop.ozone.om.TestOmAcls 
   hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException 
   hadoop.ozone.client.rpc.TestSecureOzoneRpcClient 
   hadoop.ozone.om.TestOmBlockVersioning 
   hadoop.ozone.scm.TestGetCommittedBlockLengthAndPutKey 
   hadoop.ozone.scm.pipeline.TestSCMPipelineMetrics 
   hadoop.ozone.om.TestScmSafeMode 
   hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules 
   hadoop.ozone.web.client.TestVolume 
   hadoop.ozone.client.rpc.TestReadRetries 
   hadoop.fs.ozone.contract.ITestOzoneContractGetFileStatus 
   hadoop.fs.ozone.contract.ITestOzoneContractDelete 
   hadoop.fs.ozone.contract.ITestOzoneContractSeek 
   hadoop.fs.ozone.contract.ITestOzoneContractMkdir 
   hadoop.fs.ozone.contract.ITestOzoneContractRootDir 
   hadoop.fs.ozone.contract.ITestOzoneContractRename 
   hadoop.fs.ozone.contract.ITestOzoneContractDistCp 
   hadoop.fs.ozone.contract.ITestOzoneContractOpen 
   hadoop.ozone.fsck.TestContainerMapper 
   hadoop.ozone.freon.TestFreonWithDatanodeFastRestart 
   hadoop.ozone.freon.TestFreonWithPipelineDestroy 
   hadoop.ozone.freon.TestDataValidateWithSafeByteOperations 
   hadoop.ozone.freon.TestRandomKeyGenerator 
   hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-compile-javac-root.txt
  [332K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-checkstyle-root.txt
  [17M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-pylint.txt
  [84K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1124/artifact/out/diff-patch-shelldocs.txt
  [44K]

   whitespace:

   

Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2019-05-02 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   module:hadoop-common-project/hadoop-common 
   Class org.apache.hadoop.fs.GlobalStorageStatistics defines non-transient 
non-serializable instance field map In GlobalStorageStatistics.java:instance 
field map In GlobalStorageStatistics.java 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.ipc.TestIPCServerResponder 
   hadoop.ipc.TestCallQueueManager 
   hadoop.ha.TestActiveStandbyElectorRealZK 
   hadoop.ipc.TestProtoBufRpc 
   hadoop.fs.permission.TestStickyBit 
   hadoop.fs.viewfs.TestViewFileSystemHdfs 
   hadoop.fs.TestHDFSFileContextMainOperations 
   hadoop.fs.viewfs.TestViewFsWithAcls 
   hadoop.fs.contract.hdfs.TestHDFSContractCreate 
   hadoop.fs.TestSWebHdfsFileContextMainOperations 
   hadoop.fs.contract.hdfs.TestHDFSContractGetFileStatus 
   hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot 
   hadoop.fs.viewfs.TestViewFsWithXAttrs 
   hadoop.fs.contract.hdfs.TestHDFSContractConcat 
   hadoop.fs.TestUnbuffer 
   hadoop.fs.TestEnhancedByteBufferAccess 
   hadoop.fs.contract.hdfs.TestHDFSContractMkdir 
   hadoop.fs.contract.hdfs.TestHDFSContractOpen 
   hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
   hadoop.yarn.server.resourcemanager.TestResourceTrackerService 
   hadoop.yarn.server.resourcemanager.TestLeaderElectorService 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
   hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   hadoop.yarn.client.api.impl.TestYarnClientWithReservation 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-cc-root-jdk1.8.0_191.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-compile-javac-root-jdk1.8.0_191.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-shellcheck.txt
  [72K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/whitespace-tabs.txt
  [1.2M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/309/artifact/out/xml.txt
  [12K]

   findbugs: