[jira] [Resolved] (YARN-8516) branch-2.8 compilation faliure for hadoop-yarn-server-nodemanager module

2018-07-10 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan resolved YARN-8516.
--
Resolution: Duplicate

Thanks [~rohithsharma]. I am handling this as an addendum patch for YARN-8473. 
Apologies for missing out branch-2.8 compilation.

> branch-2.8 compilation faliure for hadoop-yarn-server-nodemanager module
> 
>
> Key: YARN-8516
> URL: https://issues.apache.org/jira/browse/YARN-8516
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> branch-2.8 compilation is failing with below error
> {noformat}
> INFO] 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.142 s
> [INFO] Finished at: 2018-07-11T08:28:24+05:30
> [INFO] Final Memory: 64M/790M
> [INFO] 
> 
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hadoop-yarn-server-nodemanager: Compilation failure
> [ERROR] 
> /Users/rsharmaks/Repos/Apache/Commit_Repos/branch-2.8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java:[333,12]
>  no suitable method found for 
> warn(java.lang.String,org.apache.hadoop.yarn.api.records.ContainerId,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState)
> [ERROR] method org.apache.commons.logging.Log.warn(java.lang.Object) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.commons.logging.Log.warn(java.lang.Object,java.lang.Throwable) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8516) Compilation error for branch-2.8

2018-07-10 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8516:
---

 Summary: Compilation error for branch-2.8
 Key: YARN-8516
 URL: https://issues.apache.org/jira/browse/YARN-8516
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rohith Sharma K S


branch-2.8 compilation is failing with below error
{noformat}
INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 6.142 s
[INFO] Finished at: 2018-07-11T08:28:24+05:30
[INFO] Final Memory: 64M/790M
[INFO] 
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-yarn-server-nodemanager: Compilation failure
[ERROR] 
/Users/rsharmaks/Repos/Apache/Commit_Repos/branch-2.8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java:[333,12]
 no suitable method found for 
warn(java.lang.String,org.apache.hadoop.yarn.api.records.ContainerId,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl,org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState)
[ERROR] method org.apache.commons.logging.Log.warn(java.lang.Object) is not 
applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.commons.logging.Log.warn(java.lang.Object,java.lang.Throwable) is 
not applicable
[ERROR]   (actual and formal argument lists differ in length)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8515) container-executor can crash with SIGPIPE after nodemanager restart

2018-07-10 Thread Jim Brennan (JIRA)
Jim Brennan created YARN-8515:
-

 Summary: container-executor can crash with SIGPIPE after 
nodemanager restart
 Key: YARN-8515
 URL: https://issues.apache.org/jira/browse/YARN-8515
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jim Brennan
Assignee: Jim Brennan


When running with docker on large clusters, we have noticed that sometimes 
docker containers are not removed - they remain in the exited state, and the 
corresponding container-executor is no longer running.  Upon investigation, we 
noticed that this always seemed to happen after a nodemanager restart.   The 
sequence leading to the stranded docker containers is:
 # Nodemanager restarts
 # Containers are recovered and then run for a while
 # Containers are killed for some (legitimate) reason
 # Container-executor exits without removing the docker container.

After reproducing this on a test cluster, we found that the container-executor 
was exiting due to a SIGPIPE.

What is happening is that the shell command executor that is used to start 
container-executor has threads reading from c-e's stdout and stderr.  When the 
NM is restarted, these threads are killed.  Then when the container-executor 
continues executing after the container exits with error, it tries to write to 
stderr (ERRORFILE) and gets a SIGPIPE.  Since SIGPIPE is not handled, this 
crashes the container-executor before it can actually remove the docker 
container.

We ran into this in branch 2.8.  The way docker containers are removed has been 
completely redesigned in trunk, so I don't think it will lead to this exact 
failure, but after an NM restart, potentially any write to stderr or stdout in 
the container-executor could cause it to crash.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8514) YARN RegistryDNS throws NPE when Kerberos tgt expires

2018-07-10 Thread Eric Yang (JIRA)
Eric Yang created YARN-8514:
---

 Summary: YARN RegistryDNS throws NPE when Kerberos tgt expires
 Key: YARN-8514
 URL: https://issues.apache.org/jira/browse/YARN-8514
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Yang


After Kerberos ticket expires, RegistryDNS throws NPE error:
{code:java}
2018-07-06 01:26:25,025 ERROR yarn.YarnUncaughtExceptionHandler 
(YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[TGT 
Renewer for rm/y001.l42scl.hortonworks@l42scl.hortonworks.com,5,main] threw 
an Exception.

java.lang.NullPointerException

        at 
javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)

        at 
org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)

        at java.lang.Thread.run(Thread.java:745){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-07-10 Thread Che Yufei (JIRA)
Che Yufei created YARN-8513:
---

 Summary: CapacityScheduler infinite loop when queue is near fully 
utilized
 Key: YARN-8513
 URL: https://issues.apache.org/jira/browse/YARN-8513
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 2.9.1
 Environment: Ubuntu 14.04.5

YARN is configured with one label and 5 queues.
Reporter: Che Yufei


ResourceManager does not respond to any request when queue is near fully 
utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM 
restart, it can recover running jobs and start accepting new ones.

 

Seems like CapacityScheduler is in an infinite loop printing out the following 
log messages (more than 25,000 lines in a second):

 

{{2018-07-10 17:16:29,227 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
assignedContainer queue=root usedCapacity=0.99816763 
absoluteUsedCapacity=0.99816763 used= 
cluster=}}
{{2018-07-10 17:16:29,227 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal}}
{{2018-07-10 17:16:29,227 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1530619767030_1652_01 
container=null 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
 clusterResource= type=NODE_LOCAL 
requestedPartition=}}

 

I encounter this problem several times after upgrading to YARN 2.9.1, while the 
same configuration works fine under version 2.7.3.

 

YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a 
similar problem.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-07-10 Thread Yongjun Zhang
Welcome Jonathan.

http://hadoop.apache.org/releases.html stated:
"Hadoop is released as source code tarballs with corresponding binary
tarballs for convenience. "

and Andrew Wang said "The binary artifacts (including JARs) are technically
just convenience artifacts" and it seems not an uncommon practice to do
follow-up builds to release maven artifacts.

IIRC, Andrew once shared with me that we started in 3.x to use a single
build to to do both release binaries creation and maven artifacts
deployment, prior releases are using multiple builds:

Referring to https://wiki.apache.org/hadoop/HowToRelease

   - 3.x: step 4 in   "Creating the release candidate (X.Y.Z-RC)"
   section does both release binaries creation and maven artifacts deployment.
   - prior to 3.x: step 4 does release binary creation, and step 10 does
   maven artifacts deployment, *each step does its build so two builds here*.
   As a matter of fact, I did not run step 10 for 3.0.3.

That said, I agree that ideally it's better to do a single build to
generate release binaries and deploy maven artifacts from the same build.

Hope it helps. Welcome other folks to chime in.

Best,

--Yongjun






On Mon, Jul 9, 2018 at 2:08 PM, Jonathan Eagles  wrote:

> Thank you, Yongjun Zhang for resolving this issue for me. I have verified
> the 3.0.3 build is now working for me for tez to specify as a hadoop
> dependency.
>
> As for release procedure, can someone comment on what to do now that the
> artifacts published to maven are different than the voted on artifacts. I
> believe the source code is what is voted on and the maven artifacts are
> just for convenience, but would like an "official" answer.
>
> Reference:
> https://issues.apache.org/jira/browse/TEZ-3955
>
> Regards,
> jeagles
>
> On Mon, Jul 9, 2018 at 12:26 PM, Yongjun Zhang 
> wrote:
>
>> HI Jonathan,
>>
>> I have updated the artifacts, so now
>>
>> https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3
>> .0.2~~
>> https://repository.apache.org/#nexus-search;gav~org.apache.hadoop~~3.0.3
>> ~~
>>
>> are more consistent, except that 3.0.3 has an extra entry for rbf. Would
>> you please try again?
>>
>> The propagation to
>> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project
>> will take some time. I did nothing different than last time, so keep
>> finger crossed that it will propagate there.
>>
>> Thanks Sammi Chen and Andrew Wang for info and advice, and sorry for the
>> inconvenience again.
>>
>> Best,
>>
>> --Yongjun
>>
>> On Mon, Jul 2, 2018 at 9:30 AM, Jonathan Eagles 
>> wrote:
>>
>>> Release 3.0.3 is still broken due to the missing artifacts. Any update
>>> on when these artifacts will be published?
>>>
>>> On Wed, Jun 27, 2018 at 8:25 PM, Chen, Sammi 
>>> wrote:
>>>
 Hi Yongjun,





 The artifacts will be pushed to https://mvnrepository.com/arti
 fact/org.apache.hadoop/hadoop-project after step 6 of Publishing steps.


 For 2.9.1, I remember I absolutely did the step before. I redo the step
 6 today and now 2.9.1 is pushed to the mvn repo.

 You can double check it. I suspect sometimes Nexus may fail to notify
 user when this is unexpected failures.





 Bests,

 Sammi

 *From:* Yongjun Zhang [mailto:yzh...@cloudera.com]
 *Sent:* Sunday, June 17, 2018 12:17 PM
 *To:* Jonathan Eagles ; Chen, Sammi <
 sammi.c...@intel.com>
 *Cc:* Eric Payne ; Hadoop Common <
 common-...@hadoop.apache.org>; Hdfs-dev ;
 mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org
 *Subject:* Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)



 + Junping, Sammi



 Hi Jonathan,



 Many thanks for reporting the issues and sorry for the inconvenience.



 1. Shouldn't the build be looking for artifacts in



 https://repository.apache.org/content/repositories/releases

 rather than



 https://repository.apache.org/content/repositories/snapshots

 ?



 2.

 Not seeing the artifact published here as well.

 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-project



 Indeed, I did not see 2.9.1 there too. So included Sammi Chen.



 Hi Junping, would you please share which step in

 https://wiki.apache.org/hadoop/HowToRelease

 should have done this?



 Thanks a lot.



 --Yongjun



 On Fri, Jun 15, 2018 at 10:52 PM, Jonathan Eagles 
 wrote:

 Upgraded Tez dependency to hadoop 3.0.3 and found this issue. Anyone
 else seeing this issue?



 [ERROR] Failed to execute goal on project hadoop-shim: Could not
 resolve dependencies for project 
 org.apache.tez:hadoop-shim:jar:0.10.0-SNAPSHOT:
 Failed to collect dependencies at 
 

[jira] [Created] (YARN-8512) ATSv2 entities are not published to HBase

2018-07-10 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8512:
---

 Summary: ATSv2 entities are not published to HBase
 Key: YARN-8512
 URL: https://issues.apache.org/jira/browse/YARN-8512
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Rohith Sharma K S


It is observed that if 1st attempt master container is died and 2nd attempt 
master container is launched in a NM where old containers are running but not 
master container. 

||Attempt||NM1||NM2||Action||
|attempt-1|master container i.e container-1-1|container-1-2|master container 
died|
|attempt-2|NA|container-1-2 and master container container-2-1|NA|

In the above scenario, NM doesn't identifies flowContext and will get log below
{noformat}
2018-07-10 00:44:38,285 WARN  storage.HBaseTimelineWriterImpl 
(HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: 
flowName=null appId=application_1531175172425_0001 userId=hbase 
clusterId=yarn-cluster . Not proceeding with writing to hbase
2018-07-10 00:44:38,560 WARN  storage.HBaseTimelineWriterImpl 
(HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: 
flowName=null appId=application_1531175172425_0001 userId=hbase 
clusterId=yarn-cluster . Not proceeding with writing to hbase
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM

2018-07-10 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8511:
-

 Summary: When AM releases a container, RM removes allocation tags 
before it is released by NM
 Key: YARN-8511
 URL: https://issues.apache.org/jira/browse/YARN-8511
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.1.0
Reporter: Weiwei Yang
Assignee: Weiwei Yang


User leverages PC with allocation tags to avoid port conflicts between apps, we 
found sometimes they still get port conflicts. This is a similar issue like 
YARN-4148. Because RM immediately removes allocation tags once AM#allocate asks 
to release a container, however container on NM has some delay until it 
actually gets killed and released the port. We should let RM remove allocation 
tags AFTER NM confirms the containers are released. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org