[jira] [Resolved] (YARN-8516) branch-2.8 compilation faliure for hadoop-yarn-server-nodemanager module
[ https://issues.apache.org/jira/browse/YARN-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan resolved YARN-8516. -- Resolution: Duplicate Thanks [~rohithsharma]. I am handling this as an addendum patch for YARN-8473. Apologies for missing out branch-2.8 compilation. > branch-2.8 compilation faliure for hadoop-yarn-server-nodemanager module > > > Key: YARN-8516 > URL: https://issues.apache.org/jira/browse/YARN-8516 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Blocker > > branch-2.8 compilation is failing with below error > {noformat} > INFO] > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 6.142 s > [INFO] Finished at: 2018-07-11T08:28:24+05:30 > [INFO] Final Memory: 64M/790M > [INFO] > > [WARNING] The requested profile "yarn-ui" could not be activated because it > does not exist. > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) > on project hadoop-yarn-server-nodemanager: Compilation failure > [ERROR] > /Users/rsharmaks/Repos/Apache/Commit_Repos/branch-2.8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java:[333,12] > no suitable method found for > warn(java.lang.String,org.apache.hadoop.yarn.api.records.ContainerId,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState) > [ERROR] method org.apache.commons.logging.Log.warn(java.lang.Object) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.commons.logging.Log.warn(java.lang.Object,java.lang.Throwable) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8516) Compilation error for branch-2.8
Rohith Sharma K S created YARN-8516: --- Summary: Compilation error for branch-2.8 Key: YARN-8516 URL: https://issues.apache.org/jira/browse/YARN-8516 Project: Hadoop YARN Issue Type: Bug Reporter: Rohith Sharma K S branch-2.8 compilation is failing with below error {noformat} INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6.142 s [INFO] Finished at: 2018-07-11T08:28:24+05:30 [INFO] Final Memory: 64M/790M [INFO] [WARNING] The requested profile "yarn-ui" could not be activated because it does not exist. [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-yarn-server-nodemanager: Compilation failure [ERROR] /Users/rsharmaks/Repos/Apache/Commit_Repos/branch-2.8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java:[333,12] no suitable method found for warn(java.lang.String,org.apache.hadoop.yarn.api.records.ContainerId,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState) [ERROR] method org.apache.commons.logging.Log.warn(java.lang.Object) is not applicable [ERROR] (actual and formal argument lists differ in length) [ERROR] method org.apache.commons.logging.Log.warn(java.lang.Object,java.lang.Throwable) is not applicable [ERROR] (actual and formal argument lists differ in length) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8515) container-executor can crash with SIGPIPE after nodemanager restart
Jim Brennan created YARN-8515: - Summary: container-executor can crash with SIGPIPE after nodemanager restart Key: YARN-8515 URL: https://issues.apache.org/jira/browse/YARN-8515 Project: Hadoop YARN Issue Type: Bug Reporter: Jim Brennan Assignee: Jim Brennan When running with docker on large clusters, we have noticed that sometimes docker containers are not removed - they remain in the exited state, and the corresponding container-executor is no longer running. Upon investigation, we noticed that this always seemed to happen after a nodemanager restart. The sequence leading to the stranded docker containers is: # Nodemanager restarts # Containers are recovered and then run for a while # Containers are killed for some (legitimate) reason # Container-executor exits without removing the docker container. After reproducing this on a test cluster, we found that the container-executor was exiting due to a SIGPIPE. What is happening is that the shell command executor that is used to start container-executor has threads reading from c-e's stdout and stderr. When the NM is restarted, these threads are killed. Then when the container-executor continues executing after the container exits with error, it tries to write to stderr (ERRORFILE) and gets a SIGPIPE. Since SIGPIPE is not handled, this crashes the container-executor before it can actually remove the docker container. We ran into this in branch 2.8. The way docker containers are removed has been completely redesigned in trunk, so I don't think it will lead to this exact failure, but after an NM restart, potentially any write to stderr or stdout in the container-executor could cause it to crash. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8514) YARN RegistryDNS throws NPE when Kerberos tgt expires
Eric Yang created YARN-8514: --- Summary: YARN RegistryDNS throws NPE when Kerberos tgt expires Key: YARN-8514 URL: https://issues.apache.org/jira/browse/YARN-8514 Project: Hadoop YARN Issue Type: Bug Reporter: Eric Yang After Kerberos ticket expires, RegistryDNS throws NPE error: {code:java} 2018-07-06 01:26:25,025 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[TGT Renewer for rm/y001.l42scl.hortonworks@l42scl.hortonworks.com,5,main] threw an Exception. java.lang.NullPointerException at javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) at org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) at java.lang.Thread.run(Thread.java:745){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
Che Yufei created YARN-8513: --- Summary: CapacityScheduler infinite loop when queue is near fully utilized Key: YARN-8513 URL: https://issues.apache.org/jira/browse/YARN-8513 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, yarn Affects Versions: 2.9.1 Environment: Ubuntu 14.04.5 YARN is configured with one label and 5 queues. Reporter: Che Yufei ResourceManager does not respond to any request when queue is near fully utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM restart, it can recover running jobs and start accepting new ones. Seems like CapacityScheduler is in an infinite loop printing out the following log messages (more than 25,000 lines in a second): {{2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.99816763 absoluteUsedCapacity=0.99816763 used= cluster=}} {{2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal}} {{2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1530619767030_1652_01 container=null queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 clusterResource= type=NODE_LOCAL requestedPartition=}} I encounter this problem several times after upgrading to YARN 2.9.1, while the same configuration works fine under version 2.7.3. YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a similar problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
Welcome Jonathan. http://hadoop.apache.org/releases.html stated: "Hadoop is released as source code tarballs with corresponding binary tarballs for convenience. " and Andrew Wang said "The binary artifacts (including JARs) are technically just convenience artifacts" and it seems not an uncommon practice to do follow-up builds to release maven artifacts. IIRC, Andrew once shared with me that we started in 3.x to use a single build to to do both release binaries creation and maven artifacts deployment, prior releases are using multiple builds: Referring to https://wiki.apache.org/hadoop/HowToRelease - 3.x: step 4 in "Creating the release candidate (X.Y.Z-RC)" section does both release binaries creation and maven artifacts deployment. - prior to 3.x: step 4 does release binary creation, and step 10 does maven artifacts deployment, *each step does its build so two builds here*. As a matter of fact, I did not run step 10 for 3.0.3. That said, I agree that ideally it's better to do a single build to generate release binaries and deploy maven artifacts from the same build. Hope it helps. Welcome other folks to chime in. Best, --Yongjun On Mon, Jul 9, 2018 at 2:08 PM, Jonathan Eagles wrote: > Thank you, Yongjun Zhang for resolving this issue for me. I have verified > the 3.0.3 build is now working for me for tez to specify as a hadoop > dependency. > > As for release procedure, can someone comment on what to do now that the > artifacts published to maven are different than the voted on artifacts. I > believe the source code is what is voted on and the maven artifacts are > just for convenience, but would like an "official" answer. > > Reference: > https://issues.apache.org/jira/browse/TEZ-3955 > > Regards, > jeagles > > On Mon, Jul 9, 2018 at 12:26 PM, Yongjun Zhang > wrote: > >> HI Jonathan, >> >> I have updated the artifacts, so now >> >> https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3 >> .0.2~~ >> https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3.0.3 >> ~~ >> >> are more consistent, except that 3.0.3 has an extra entry for rbf. Would >> you please try again? >> >> The propagation to >> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project >> will take some time. I did nothing different than last time, so keep >> finger crossed that it will propagate there. >> >> Thanks Sammi Chen and Andrew Wang for info and advice, and sorry for the >> inconvenience again. >> >> Best, >> >> --Yongjun >> >> On Mon, Jul 2, 2018 at 9:30 AM, Jonathan Eagles >> wrote: >> >>> Release 3.0.3 is still broken due to the missing artifacts. Any update >>> on when these artifacts will be published? >>> >>> On Wed, Jun 27, 2018 at 8:25 PM, Chen, Sammi >>> wrote: >>> Hi Yongjun, The artifacts will be pushed to https://mvnrepository.com/arti fact/org.apache.hadoop/hadoop-project after step 6 of Publishing steps. For 2.9.1, I remember I absolutely did the step before. I redo the step 6 today and now 2.9.1 is pushed to the mvn repo. You can double check it. I suspect sometimes Nexus may fail to notify user when this is unexpected failures. Bests, Sammi *From:* Yongjun Zhang [mailto:yzh...@cloudera.com] *Sent:* Sunday, June 17, 2018 12:17 PM *To:* Jonathan Eagles ; Chen, Sammi < sammi.c...@intel.com> *Cc:* Eric Payne ; Hadoop Common < common-...@hadoop.apache.org>; Hdfs-dev ; mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org *Subject:* Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0) + Junping, Sammi Hi Jonathan, Many thanks for reporting the issues and sorry for the inconvenience. 1. Shouldn't the build be looking for artifacts in https://repository.apache.org/content/repositories/releases rather than https://repository.apache.org/content/repositories/snapshots ? 2. Not seeing the artifact published here as well. https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project Indeed, I did not see 2.9.1 there too. So included Sammi Chen. Hi Junping, would you please share which step in https://wiki.apache.org/hadoop/HowToRelease should have done this? Thanks a lot. --Yongjun On Fri, Jun 15, 2018 at 10:52 PM, Jonathan Eagles wrote: Upgraded Tez dependency to hadoop 3.0.3 and found this issue. Anyone else seeing this issue? [ERROR] Failed to execute goal on project hadoop-shim: Could not resolve dependencies for project org.apache.tez:hadoop-shim:jar:0.10.0-SNAPSHOT: Failed to collect dependencies at
[jira] [Created] (YARN-8512) ATSv2 entities are not published to HBase
Rohith Sharma K S created YARN-8512: --- Summary: ATSv2 entities are not published to HBase Key: YARN-8512 URL: https://issues.apache.org/jira/browse/YARN-8512 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Rohith Sharma K S It is observed that if 1st attempt master container is died and 2nd attempt master container is launched in a NM where old containers are running but not master container. ||Attempt||NM1||NM2||Action|| |attempt-1|master container i.e container-1-1|container-1-2|master container died| |attempt-2|NA|container-1-2 and master container container-2-1|NA| In the above scenario, NM doesn't identifies flowContext and will get log below {noformat} 2018-07-10 00:44:38,285 WARN storage.HBaseTimelineWriterImpl (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: flowName=null appId=application_1531175172425_0001 userId=hbase clusterId=yarn-cluster . Not proceeding with writing to hbase 2018-07-10 00:44:38,560 WARN storage.HBaseTimelineWriterImpl (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: flowName=null appId=application_1531175172425_0001 userId=hbase clusterId=yarn-cluster . Not proceeding with writing to hbase {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM
Weiwei Yang created YARN-8511: - Summary: When AM releases a container, RM removes allocation tags before it is released by NM Key: YARN-8511 URL: https://issues.apache.org/jira/browse/YARN-8511 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 3.1.0 Reporter: Weiwei Yang Assignee: Weiwei Yang User leverages PC with allocation tags to avoid port conflicts between apps, we found sometimes they still get port conflicts. This is a similar issue like YARN-4148. Because RM immediately removes allocation tags once AM#allocate asks to release a container, however container on NM has some delay until it actually gets killed and released the port. We should let RM remove allocation tags AFTER NM confirms the containers are released. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org