[jira] [Resolved] (YARN-4794) Deadlock in NMClientImpl

2016-04-12 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-4794.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   2.8.0

> Deadlock in NMClientImpl
> 
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, 
> YARN-4794.2.patch
>
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2016-04-12 Thread Mahens (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238654#comment-15238654
 ] 

Mahens commented on YARN-2624:
--

Issue Still persists in YARN 2.7.1 and HDP version is 2.3. We see this issue 
intermittently. with same above error. Is there any Hadoop command to clear of 
cache directory ?

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4794) Deadlock in NMClientImpl

2016-04-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238653#comment-15238653
 ] 

Rohith Sharma K S commented on YARN-4794:
-

committed to branch-2.7 also.. thanks [~jianhe] for the patch!! thanks 
[~vinodkv] for additional review:-)

> Deadlock in NMClientImpl
> 
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, 
> YARN-4794.2.patch
>
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4794) Deadlock in NMClientImpl

2016-04-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238648#comment-15238648
 ] 

Rohith Sharma K S commented on YARN-4794:
-

+1 lgtm.

> Deadlock in NMClientImpl
> 
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, 
> YARN-4794.2.patch
>
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4633) TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk

2016-04-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238630#comment-15238630
 ] 

Rohith Sharma K S commented on YARN-4633:
-

Need not to backport this JIRA. This issue is mainly originated from YARN-4584 
which is affected in 2.9 version only.

> TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk 
> -
>
> Key: YARN-4633
> URL: https://issues.apache.org/jira/browse/YARN-4633
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.9.0
> Environment: Jenkin
>Reporter: Rohith Sharma K S
>Assignee: Bibin A Chundatt
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4633.patch
>
>
> Jenkins 
> [Build|https://builds.apache.org/job/PreCommit-YARN-Build/10366/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66.txt]
>  failed for below test case, 
> {code}
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
> support was removed in 8.0
> Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 455.808 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testRMRestartAfterPreemption[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 60.145 sec  <<< FAILURE!
> java.lang.AssertionError: Attempt state is not correct (timedout): expected: 
> SCHEDULED actual: FAILED for the application attempt 
> appattempt_1453461355278_0001_04
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:197)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:172)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForAttemptScheduled(MockRM.java:831)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:818)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartAfterPreemption(TestRMRestart.java:2352)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-04-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238624#comment-15238624
 ] 

Allen Wittenauer commented on YARN-4734:


* Definitely need some clarification from ASF legal whether we can merge 
licenses like that. My hunch is no, but IANAL.
* The dist and tmp directories should be inside target and not in the root of 
the module.  This makes a ton of other problems go away.
* Why is there a separate profile for this?  What UI do I get if I don't build 
with this profile? This also means the precommit hooks won't work until the 
hadoop personality is modified (which means the above precommit testing is 
mostly useless)
* Double check the license headers.  At least one of 'em was using the old text.
* Why isn't YarnUI2.md's content in BUILDING.txt?  Why does an *end user* care 
about this information?  Also, heads up to [~andrew.wang] since he is looking 
to cut a release off of trunk relatively soon.  This may have to get jettisoned 
before the cut.
* The Apache RAT excludes files that don't or shouldn't exist (e.g., travis.yml)
* The Apache RAT excludes files that actually have a license.
* Why does "hadoop-yarn-ui/src/main/resources/META-INF/NOTICE.txt" mention Tez? 
 Why is this file even there?
* hadoop-yarn-ui/src/main/webapp/package.json should have it's version pulled 
from maven.  Let's not repeat past mistakes like we did with libhadoop.so 
getting some random version number.




> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, 
> YARN-4734.4.patch, YARN-4734.5.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2016-04-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238616#comment-15238616
 ] 

Rohith Sharma K S commented on YARN-4953:
-

Even though scenario is very rare to happen, thinking one step ahead which 
would affect applications with large number of containers.
Thoughts.??

> Delete completed container log folder when rolling log aggregation is enabled
> -
>
> Key: YARN-4953
> URL: https://issues.apache.org/jira/browse/YARN-4953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> There would be potential bottle neck when cluster is running with very large 
> number of containers on the same NodeManager for single application. The 
> linux limits the subfolders count to 32K. If number of containers is greater 
> than 32K for an application, there would be container launch failure. At this 
> point of time, there are no more containers can be launched in this node.
> Currently log folders are deleted after app is finished. Rolling log 
> aggregation aggregates logs to hdfs periodically. 
> I think if aggregation is completed for finished containers, then clean up 
> can be done i.e deleting log folder for finished containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2016-04-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238596#comment-15238596
 ] 

Rohith Sharma K S commented on YARN-4953:
-

cc :/ [~jlowe]

> Delete completed container log folder when rolling log aggregation is enabled
> -
>
> Key: YARN-4953
> URL: https://issues.apache.org/jira/browse/YARN-4953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> There would be potential bottle neck when cluster is running with very large 
> number of containers on the same NodeManager for single application. The 
> linux limits the subfolders count to 32K. If number of containers is greater 
> than 32K for an application, there would be container launch failure. At this 
> point of time, there are no more containers can be launched in this node.
> Currently log folders are deleted after app is finished. Rolling log 
> aggregation aggregates logs to hdfs periodically. 
> I think if aggregation is completed for finished containers, then clean up 
> can be done i.e deleting log folder for finished containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2016-04-12 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4953:
---

 Summary: Delete completed container log folder when rolling log 
aggregation is enabled
 Key: YARN-4953
 URL: https://issues.apache.org/jira/browse/YARN-4953
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


There would be potential bottle neck when cluster is running with very large 
number of containers on the same NodeManager for single application. The linux 
limits the subfolders count to 32K. If number of containers is greater than 32K 
for an application, there would be container launch failure. At this point of 
time, there are no more containers can be launched in this node.

Currently log folders are deleted after app is finished. Rolling log 
aggregation aggregates logs to hdfs periodically. 

I think if aggregation is completed for finished containers, then clean up can 
be done i.e deleting log folder for finished containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-04-12 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238478#comment-15238478
 ] 

Daniel Zhi commented on YARN-4676:
--

1. I don't expect it will disappear by next patch but will focus on other 
issues first.
2. I will revert these two files (I didn't notice them due to my local diff 
tool skipped empty changes).
3. I will restore the resolve() (it was due to my manual merge).
4. Yes it will simplify the code.
5. refreshNodes(long timeout) basically remains unchanged. The client enforces 
a timeout which is not fully integrated with the automatic logic in RM side 
(NodesListManager uses the internal default timeout (3600 seconds)). Given the 
code checks status every second, it was likely expect a smaller timeout from 
command line. So the command line timeout experience would be same as before. A 
deeper integration is to pass the timeout through RefreshNodesRequest to 
NodesListManager to honor it. The client-side wait-and-check can still be there 
but no need to FORCEFUL decommission as it supposes to happen automatically.
6. I am surprised that update() no longer throw exception (maybe the code 
evolved since original version). So I will remove updateNoThrow() (and will log 
full exception in readDecommissioningTimeout).
7. I will add synchronized. It will be called by every node during every 
heartbeat. But the implementation is efficient enough to not have contention 
due to synchronized. 
8. Is there a list on what "docs" include?


> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch, YARN-4676.009.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238407#comment-15238407
 ] 

Sangjin Lee commented on YARN-3816:
---

Thanks [~gtCarrera9] for the quick update!

As for the new metric type (i.e. base type + "_" + contributing child entity 
type), I do see the rationale (or need) to distinguish aggregation coming from 
different entities. We should still note that the metric would show somewhat 
awkwardly if we read the applications via queries. Aggregated metrics would 
look like "MEMORY_YARN_CONTAINER" for example. I'm not quite sure if there 
would be additional issues.

Also, I think we should be real judicious in permitting the aggregation. The 
most important case should be YARN container-to-app. For per-framework metrics, 
AMs themselves should handle internal aggregations themselves and simply add to 
the application, as they usually have the app-level metrics already anyway. 
That should be the main way to support them.

(TimelineMetric.java)
- l.244: “accumulated” -> “aggregated”?

(AppLevelTimelineCollector.java)
- l.126: typo: “teal-time” -> “real-time"

(TimelineCollector.java)
- l.83, 87: since these methods expose internals of the {{TimelineCollector}} 
class, I would make them {{protected}} to ensure only subclasses can use them
- l. 171: I could suggest one more optimization in terms of memory footprint. 
If the given entity does not have metrics, then we can/should skip the entire 
aggregation status step.
- l.230: It should be {{putIfAbsent()}}. Otherwise, {{put()}} would simply 
overwrite the value even if the value exists, and it will result in an 
incorrect object being used.

(ApplicationColumnPrefix.java)
- l.214: per comments on the JIRA, this new {{store()}} method should be 
removed, right?

I would encourage others to take a closer look at this too. Thanks!

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4878) Expose scheduling policy and max running apps over JMX for Yarn queues

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238383#comment-15238383
 ] 

Hadoop QA commented on YARN-4878:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 38 unchanged - 0 fixed = 39 total (was 38) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 153m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.webapp.TestRMWithCSRFFilter |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
| JDK v1.8.0_77 Timed out junit tests | 

[jira] [Commented] (YARN-4366) Fix Lint Warnings in YARN Common

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238359#comment-15238359
 ] 

Hadoop QA commented on YARN-4366:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 11s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12772862/YARN-4366.001.patch |
| JIRA Issue | YARN-4366 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4e74a152a4fb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4366) Fix Lint Warnings in YARN Common

2016-04-12 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238322#comment-15238322
 ] 

Robert Kanter commented on YARN-4366:
-

We should verify that this doesn't break anything.  As explained in [this 
StackOverflow|http://stackoverflow.com/questions/5401537/i-have-got-this-warning-non-varargs-call-of-varargs-method-with-inexact-argumen],
 there's a difference between something like {{cls.getMethod(action, null);}} 
and something like {{cls.getMethod(action);}}.  The latter constructs an empty 
array while the former is ambiguous if it passes a single {{null}} instance or 
an array with a single {{null}} element (hence the warning).

Unfortunately, besides being reflection, the code is very generic, so it's not 
straightforward to track down what it's being called on and what those expect 
here.

> Fix Lint Warnings in YARN Common
> 
>
> Key: YARN-4366
> URL: https://issues.apache.org/jira/browse/YARN-4366
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-4366.001.patch
>
>
> {noformat}
> [WARNING] 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java:[100,45]
>  non-varargs call of varargs method with inexact argument type for last 
> parameter;
>   cast to java.lang.Class for a varargs call
>   cast to java.lang.Class[] for a non-varargs call and to suppress this 
> warning
> [WARNING] 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factory/providers/RpcFactoryProvider.java:[62,46]
>  non-varargs call of varargs method with inexact argument type for last 
> parameter;
>   cast to java.lang.Class for a varargs call
>   cast to java.lang.Class[] for a non-varargs call and to suppress this 
> warning
> [WARNING] 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factory/providers/RpcFactoryProvider.java:[64,34]
>  non-varargs call of varargs method with inexact argument type for last 
> parameter;
>   cast to java.lang.Object for a varargs call
>   cast to java.lang.Object[] for a non-varargs call and to suppress this 
> warning
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238294#comment-15238294
 ] 

Hadoop QA commented on YARN-3150:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 28s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
15s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 18m 48s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798401/YARN-3150-YARN-2928.01.patch
 |
| JIRA Issue | YARN-3150 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux e6373e4ee781 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2928 / 3df8b0d |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11057/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> [Documentation] Documenting the timeline service v2
> ---
>
> Key: YARN-3150
> URL: https://issues.apache.org/jira/browse/YARN-3150
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch
>
>
> Let's make sure we will have a document to describe what's new in TS v2, the 
> APIs, the client libs and so on. We should do better around documentation in 
> v2 than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

2016-04-12 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238268#comment-15238268
 ] 

Zhe Zhang commented on YARN-2694:
-

Thanks a lot Wangda for the clear explanation! I think YARN-4140 is what we 
need.

> Ensure only single node labels specified in resource request / host, and node 
> label expression only specified when resourceName=ANY
> ---
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
> YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
> YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
> YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
> YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
> YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
> YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
> YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest/host with 
> multiple node labels will make user limit, etc. computation becomes more 
> tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService
> - RMAdminCLI
> - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2

2016-04-12 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3150:
--
Attachment: YARN-3150-YARN-2928.01.patch

Posted patch v.1.

I created a separate document for v.2, and added a link to this doc from the 
existing TS doc. Could you please review for correctness and (reasonable) 
completeness? Since we're still making changes, this doc is not going to be 
totally complete. It needs to contain the necessary information for people to 
get started.

> [Documentation] Documenting the timeline service v2
> ---
>
> Key: YARN-3150
> URL: https://issues.apache.org/jira/browse/YARN-3150
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch
>
>
> Let's make sure we will have a document to describe what's new in TS v2, the 
> APIs, the client libs and so on. We should do better around documentation in 
> v2 than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2

2016-04-12 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3150:
--
Attachment: TimelineServiceV2.html

Documentation in html

> [Documentation] Documenting the timeline service v2
> ---
>
> Key: YARN-3150
> URL: https://issues.apache.org/jira/browse/YARN-3150
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: TimelineServiceV2.html
>
>
> Let's make sure we will have a document to describe what's new in TS v2, the 
> APIs, the client libs and so on. We should do better around documentation in 
> v2 than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4878) Expose scheduling policy and max running apps over JMX for Yarn queues

2016-04-12 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4878:
---
Attachment: YARN-4878.002.patch

> Expose scheduling policy and max running apps over JMX for Yarn queues
> --
>
> Key: YARN-4878
> URL: https://issues.apache.org/jira/browse/YARN-4878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-4878.001.patch, YARN-4878.002.patch
>
>
> There are two things that are not currently visible over JMX: the current 
> scheduling policy for a queue, and the number of max running apps. It would 
> be great if these could be exposed over JMX as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4878) Expose scheduling policy and max running apps over JMX for Yarn queues

2016-04-12 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238150#comment-15238150
 ] 

Yufei Gu commented on YARN-4878:


Hi [~kasha], thanks for the review.
1. I tried {{scheduler.allocConf}}. But lots of test cases failed. I figured 
out and fixed them in the second patch.
2. I did this to mimic the existing code. If we are going to consolidate both 
of them, can we do it in a followup JIRA?

> Expose scheduling policy and max running apps over JMX for Yarn queues
> --
>
> Key: YARN-4878
> URL: https://issues.apache.org/jira/browse/YARN-4878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-4878.001.patch
>
>
> There are two things that are not currently visible over JMX: the current 
> scheduling policy for a queue, and the number of max running apps. It would 
> be great if these could be exposed over JMX as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3816:

Attachment: YARN-3816-YARN-2928-v6.patch

OK v6 version of the patch. Addressed most of Sangjin's comments and removed 
some unnecessary code. Specially, something I addressed in ways other than 
Sangjin's suggestions:

- I did not move the aggregation logic to app-level collector completely. 
Instead, I left the code infrastructure in TimelineCollector but moved the 
logic to launch the aggregation into app-level collector. In this way, we keep 
the aggregation infrastructure to be a fairly general one for future collectors 
(like rack level collector proposed by Vinod a while ago) but can have specific 
designs for app-level aggregations. 
- With regard to the result of the aggregations, I store them in the 
application entity with entity id equals to the application id. The id for each 
of the aggregated metric is the original metric plus the aggregation group. 
Note that I think we need to keep the "aggregation group" information in the 
metric id because we may have multiple types of entities all posting the same 
metric name (especially if there are user-defined metrics posted by the 
application itself) and we may not want to aggregate them together. 
- I refactored RealTimeAggregationOperation into TimelineMetricOperations. My 
intuition here is we can provide a basic framework to define operations between 
timeline metrics, no matter it's an aggregation operation or accumulation 
operation. Right now the input of a timeline metric operation is the incoming 
metric, the existing metric, the previous state. The output should be a new 
timeline metric and the side effect can be reflected on the state. In this way 
we can model aggregation operations like SUM, AVG (not supported yet) and 
accumulation operations like REPLACE and MAX. 
- I changed the code so that we're not storing the metric aggregation 
operation. I'll rebuild them for offline aggregations through a config. Will 
address that in YARN-3817. Right now, this patch lives well with the new filter 
mechanism. 

Please do let me know if there are other concerns, thanks! 

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237927#comment-15237927
 ] 

Hadoop QA commented on YARN-4950:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 25m 25s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 37s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 34s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 101m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | 

[jira] [Updated] (YARN-4951) large IP ranges require the creation of multiple reverse lookup zones

2016-04-12 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4951:
-
Attachment: 0001-YARN-4757-address-multiple-reverse-lookup-zones-and-.patch

An approach that:

1)  Add config properties for netmask, IP range min, and IP range max
2)  Uses SubnetUtils to find the list addresses based on subnet and mask, 
selects the network addresses (end in ".0") for resulting list, and allows for 
the addresses to be filtered based on range values.

> large IP ranges require the creation of multiple reverse lookup zones
> -
>
> Key: YARN-4951
> URL: https://issues.apache.org/jira/browse/YARN-4951
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: 
> 0001-YARN-4757-address-multiple-reverse-lookup-zones-and-.patch
>
>
> Large subnet definitions (e.g. specifying a mask value of 255.255.224.0) 
> yield a large number of potential network addresses, each requiring a 
> separate reverse zone definition (given that reverse zones include the first 
> 3 IP bytes in reverse order).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4952) need configuration mechanism for specifying per-host network interface

2016-04-12 Thread Jonathan Maron (JIRA)
Jonathan Maron created YARN-4952:


 Summary: need configuration mechanism for specifying per-host 
network interface
 Key: YARN-4952
 URL: https://issues.apache.org/jira/browse/YARN-4952
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Maron
Assignee: Jonathan Maron


The initial configuration approach for the DNS service specified a bind-address 
that designated the network interface to which the service should bind its 
listener port.  However, there is a need to potentially specify multiple DNS 
service instances (HA approach) and therefore a need to specify bind addresses 
for each instance (and those interfaces may vary between hosts).  This may take 
a for similar to the RM HA approach (rm1, rm2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-04-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237761#comment-15237761
 ] 

Allen Wittenauer edited comment on YARN-4950 at 4/12/16 6:50 PM:
-

-00:
* naive copy parallel-tests from hadoop-common's pom.xml into yarn-client and 
yarn-server-resourcemanager


was (Author: aw):
-00:
* copy parallel-tests from hadoop-common's pom.xml into yarn-client and 
yarn-server-resourcemanager

> configure parallel-tests for yarn-client and yarn-server-resourcemanager
> 
>
> Key: YARN-4950
> URL: https://issues.apache.org/jira/browse/YARN-4950
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: YARN-4950.00.patch
>
>
> Unit tests for yarn-client and yarn-server-resourcemanager take over an hour 
> each.  The parallel-tests profile should be configured to reduce the 
> execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-04-12 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4950:
---
Attachment: YARN-4950.00.patch

-00:
* copy parallel-tests from hadoop-common's pom.xml into yarn-client and 
yarn-server-resourcemanager

> configure parallel-tests for yarn-client and yarn-server-resourcemanager
> 
>
> Key: YARN-4950
> URL: https://issues.apache.org/jira/browse/YARN-4950
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: YARN-4950.00.patch
>
>
> Unit tests for yarn-client and yarn-server-resourcemanager take over an hour 
> each.  The parallel-tests profile should be configured to reduce the 
> execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4951) large IP ranges require the creation of multiple reverse lookup zones

2016-04-12 Thread Jonathan Maron (JIRA)
Jonathan Maron created YARN-4951:


 Summary: large IP ranges require the creation of multiple reverse 
lookup zones
 Key: YARN-4951
 URL: https://issues.apache.org/jira/browse/YARN-4951
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Maron
Assignee: Jonathan Maron


Large subnet definitions (e.g. specifying a mask value of 255.255.224.0) yield 
a large number of potential network addresses, each requiring a separate 
reverse zone definition (given that reverse zones include the first 3 IP bytes 
in reverse order).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-04-12 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4757:
-
Attachment: 0001-YARN-4757-Initial-code-submission-for-DNS-Service.patch

While I await branch-committer status, I am uploading an initial patch that 
should give a more concrete sense of a DNS service implementation.  I made use 
of the dnsjava library and implemented a good portion of the specification.   I 
plan to provide relatively frequent updates and record sub-tasks to address any 
issues unearthed.

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
> Attachments: 
> 0001-YARN-4757-Initial-code-submission-for-DNS-Service.patch, YARN-4757- 
> Simplified discovery of services via DNS mechanisms.pdf
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-04-12 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4950:
---
Affects Version/s: 3.0.0

> configure parallel-tests for yarn-client and yarn-server-resourcemanager
> 
>
> Key: YARN-4950
> URL: https://issues.apache.org/jira/browse/YARN-4950
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Priority: Critical
>
> Unit tests for yarn-client and yarn-server-resourcemanager take over an hour 
> each.  The parallel-tests profile should be configured to reduce the 
> execution time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager

2016-04-12 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-4950:
--

 Summary: configure parallel-tests for yarn-client and 
yarn-server-resourcemanager
 Key: YARN-4950
 URL: https://issues.apache.org/jira/browse/YARN-4950
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Reporter: Allen Wittenauer
Priority: Critical


Unit tests for yarn-client and yarn-server-resourcemanager take over an hour 
each.  The parallel-tests profile should be configured to reduce the execution 
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4794) Deadlock in NMClientImpl

2016-04-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4794:
--
Attachment: YARN-4794-branch-2.7.patch

branch-2.7 patch attached, [~rohithsharma], could you review ? thanks !

> Deadlock in NMClientImpl
> 
>
> Key: YARN-4794
> URL: https://issues.apache.org/jira/browse/YARN-4794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, 
> YARN-4794.2.patch
>
>
> Distributed shell app gets stuck on stopping containers after App completes 
> with the following exception
> {code:title = app log}
> 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.nio.channels.ClosedByInterruptException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237641#comment-15237641
 ] 

Hadoop QA commented on YARN-4909:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 2s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| 

[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237628#comment-15237628
 ] 

Li Lu commented on YARN-3816:
-

Thanks for the pointer Sangjin! Sure let's not use column names for aggregation 
op storage. I can make a config key so that we can rebuild the aggregation 
operation according to the config. 

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237606#comment-15237606
 ] 

Sangjin Lee commented on YARN-3816:
---

Sorry I missed the column post-fix part earlier in my review.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237602#comment-15237602
 ] 

Sangjin Lee commented on YARN-3816:
---

We discussed the cases where we may need to support adding more info for the 
metrics on YARN-4053. Especially see [this 
comment|https://issues.apache.org/jira/browse/YARN-4053?focusedCommentId=14994603=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14994603]
 (although going over the full discussion is informative). The conclusion was 
that it would be good not to store additional metadata as column pre- or 
post-fixes due to the complications mentioned in YARN-4053. If we can find a 
way to avoid that here, it would be ideal. If this is to support offline 
aggregation, options like separate configuration were also discussed.

If we end up storing that metadata in HBase, one thing we should *definitely* 
avoid is the need to read it back to do any writes. We're ruling out doing 
read-then-write as a principle, otherwise it would open up a world of pain in 
terms of performance as well as correctness.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4939) the decommissioning Node should keep alive if NM restart

2016-04-12 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237599#comment-15237599
 ] 

Daniel Templeton commented on YARN-4939:


Thanks, [~sandflee].  The patch looks good to me.  Could you please add tests 
to cover the scenario the patch addresses?

> the decommissioning Node should keep alive  if NM restart
> -
>
> Key: YARN-4939
> URL: https://issues.apache.org/jira/browse/YARN-4939
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4939.01.patch, YARN-4939.02.patch
>
>
> 1, gracefully decommission a node A
> 2, restart node A
> 3, node A could not register to RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2016-04-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237578#comment-15237578
 ] 

Eric Payne commented on YARN-2113:
--

bq. I planned to work on YARN-4781 soon but I'm working on other stuffs so at 
least I will not be able to work on it in recent 1-2 months. Please feel free 
to take over.
Thanks, [~leftnoteasy]. Sure, I would like to drive this if that's okay.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4949) [YARN-3368] Support pagination for RM/ATS Web UI applications page.

2016-04-12 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4949:
---

 Summary: [YARN-3368] Support pagination for RM/ATS Web UI 
applications page.
 Key: YARN-4949
 URL: https://issues.apache.org/jira/browse/YARN-4949
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Reporter: Rohith Sharma K S


It is obvious that user would expect pagination for applications page in RM/ATS 
web UI.  Old RM/ATS web UI has limitation that these web UI takes lot of time 
to render applications in browser. 
It would good to support batch retrieval of applications from server rather 
than retrieving all the applications from server.
This require lot of things to be considered from RMWebApp and RM/ATS server end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237548#comment-15237548
 ] 

Li Lu commented on YARN-3816:
-

Thanks [~varun_saxena] and [~sjlee0]! My bottomline is we may want to store 
some metadata for some timeline metrics. How to perform  aggregation is one 
metadata that we want to keep. We need this data so that for offline 
aggregations, like user and flow level offline aggregation, we can read out the 
aggregation operation. Is it OK to reserve a separate column for each metric to 
store their metadata (like _META)? We can skip if their 
aggregation operation is NOP for now? Thoughts? 

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237533#comment-15237533
 ] 

Daniel Templeton commented on YARN-4940:


I agree with [~kshukla] on both counts.  The fix seems sound, and I like that 
it gets rid of the extra {{UnknownNodeId}} class.  The patch needs to add tests 
to test the scenario that caused the issue.

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-12 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4514:
--
Attachment: YARN-4514-YARN-3368.7.patch

Fixed asf warnings...

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, 
> YARN-4514-YARN-3368.6.patch, YARN-4514-YARN-3368.7.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4932) [Umbrella] YARN/MR test failures on Windows

2016-04-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4932:
--
Summary: [Umbrella] YARN/MR test failures on Windows  (was: (Umbrella) 
YARN/MR test failures on Windows)

> [Umbrella] YARN/MR test failures on Windows
> ---
>
> Key: YARN-4932
> URL: https://issues.apache.org/jira/browse/YARN-4932
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Junping Du
>
> We found several test failures related to Windows. Here is Umbrella jira to 
> track them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon

2016-04-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237489#comment-15237489
 ] 

Vinod Kumar Vavilapalli commented on YARN-4928:
---

[~djp], can this be put on older releases too - 2.8.x, 2.7.x etc?

> Some yarn.server.timeline.* tests fail on Windows attempting to use a test 
> root path containing a colon
> ---
>
> Key: YARN-4928
> URL: https://issues.apache.org/jira/browse/YARN-4928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.8.0
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4928.001.patch, YARN-4928.002.patch, 
> YARN-4928.003.patch, YARN-4928.004.patch, YARN-4928.005.patch, 
> YARN-4928.006.patch
>
>
> yarn.server.timeline.TestEntityGroupFSTimelineStore.* and 
> yarn.server.timeline.TestLogInfo.* fail on Windows, because they are 
> attempting to use a test root paths like 
> "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo",
>  which contains a ":" (after the Windows drive letter) and 
> DFSUtil.isValidName() does not accept paths containing ":".
> This problem is identical to HDFS-6189, so I suggest to use the same 
> approach: using "/tmp/..." as test root dir instead of 
> System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237487#comment-15237487
 ] 

Sangjin Lee commented on YARN-3816:
---

I had a similar question to Varun. Is there another way to handle the 
aggregation operation other than making it part of the column pre/post-fix?

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4633) TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk

2016-04-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237443#comment-15237443
 ] 

Vinod Kumar Vavilapalli commented on YARN-4633:
---

Slightly old JIRA, but [~bibinchundatt] / [~rohithsharma], is this applicable 
to 2.8.x / 2.7.x / 2.6.x also? If so, can this be backported / committed to 
those branches too? Tx.

> TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk 
> -
>
> Key: YARN-4633
> URL: https://issues.apache.org/jira/browse/YARN-4633
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.9.0
> Environment: Jenkin
>Reporter: Rohith Sharma K S
>Assignee: Bibin A Chundatt
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4633.patch
>
>
> Jenkins 
> [Build|https://builds.apache.org/job/PreCommit-YARN-Build/10366/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66.txt]
>  failed for below test case, 
> {code}
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
> support was removed in 8.0
> Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 455.808 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testRMRestartAfterPreemption[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 60.145 sec  <<< FAILURE!
> java.lang.AssertionError: Attempt state is not correct (timedout): expected: 
> SCHEDULED actual: FAILED for the application attempt 
> appattempt_1453461355278_0001_04
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:197)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:172)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForAttemptScheduled(MockRM.java:831)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:818)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartAfterPreemption(TestRMRestart.java:2352)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237426#comment-15237426
 ] 

Hadoop QA commented on YARN-4514:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 51s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 34s 
{color} | {color:red} Patch generated 5 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 7m 16s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:e35bf0f |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798281/YARN-4514-YARN-3368.6.patch
 |
| JIRA Issue | YARN-4514 |
| Optional Tests |  asflicense  |
| uname | Linux ec7542f9784e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / e35bf0f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/11052/artifact/patchprocess/whitespace-tabs.txt
 |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/11052/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C:  hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui   .  U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11052/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, 
> YARN-4514-YARN-3368.6.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237387#comment-15237387
 ] 

sandflee commented on YARN-4940:


thanks [~kshukla],  the test failures seems not related, I'll check it later 
and  I'll add a test

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237335#comment-15237335
 ] 

Kuhu Shukla commented on YARN-4940:
---

The fix in the patch looks good. I have not looked at all the test failures 
yet. Just one comment, it might be nice to have a specific test to cover this 
failure besides the testUnknownNodeId since AFAICT the test did not catch this 
specific failure.

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237266#comment-15237266
 ] 

Kuhu Shukla commented on YARN-4940:
---

Thank you for reporting this [~sandflee]. Did the fix for YARN-4723 not fix the 
issue for you?

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2016-04-12 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4514:
--
Attachment: YARN-4514-YARN-3368.6.patch

Few more changes done:

- Updated LICENSE file with changed versions.
- Cleaned up code by removing some debug logs added

[~leftnoteasy] and [~varun_saxena], pls help to check the patch.

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-4514-YARN-3368.1.patch, 
> YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, 
> YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, 
> YARN-4514-YARN-3368.6.patch
>
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237251#comment-15237251
 ] 

Hadoop QA commented on YARN-4947:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 58s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 147m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237227#comment-15237227
 ] 

Hadoop QA commented on YARN-4940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 52s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 164m 24s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | hadoop.yarn.webapp.TestRMWithCSRFFilter |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
| 

[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237183#comment-15237183
 ] 

Bibin A Chundatt commented on YARN-4909:


Thanks [~vvasudev]/[~Naganarasimha]/[~sunilg] for looking into the issue.Still 
thr exists a probability for port to be same but will be very less.
Attaching patch for the same.

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4909:
---
Attachment: 0005-YARN-4909.patch

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2016-04-12 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237156#comment-15237156
 ] 

Varun Vasudev commented on YARN-4006:
-

[~gss2002] - have you gotten a chance to look at HADOOP-9054 and HADOOP-12082. 
HADOOP-12082 looks similar to the problem you're trying to solve.

With respect to your patch, can you elaborate on how AMs will authenticate with 
the timeline server? Are you passing the credentials to the AM as part of the 
job submission?

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with 
> https://issues.apache.org/jira/browse/YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237144#comment-15237144
 ] 

Sunil G commented on YARN-4909:
---

Yes [~Naganarasimha Garla]
We need that to be random number.

{{9998 + rnd. nextInt()%500}}, something like this.. So it can be random when 
called in parallel. However this is not fully solving, there can be corner 
cases. But probability is very very less.

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237140#comment-15237140
 ] 

Naganarasimha G R commented on YARN-4948:
-

hi [~wjlei], patch seems to be not compiling can you please check ?

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237134#comment-15237134
 ] 

Naganarasimha G R commented on YARN-4909:
-

well even if we go for the [~vvasudev]'s solution we cant got go for fixed 
addition or subraction it has to be random number.

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4810) NM applicationpage cause internal error 500

2016-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237128#comment-15237128
 ] 

Hudson commented on YARN-4810:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9597 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9597/])
YARN-4810. NM applicationpage cause internal error 500. Contributed by 
(naganarasimha_gr: rev 437e9d6475a91cafc4c993b206312912b5f13ad9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMAppsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ApplicationPage.java


> NM applicationpage cause internal error 500
> ---
>
> Key: YARN-4810
> URL: https://issues.apache.org/jira/browse/YARN-4810
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4810.patch, 0002-YARN-4810.patch, 1.png, 2.png
>
>
> Use url /node/application/
> *Case 1*
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58)
> ... 44 more
> {noformat}
> *Case 2*
> {noformat}
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.util.NoSuchElementException
> at 
> com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
> at 
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:131)
> at 
> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:126)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:79)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58)
> ... 44 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2567) Add a percentage-node threshold for RM to wait for new allocations after restart/failover

2016-04-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237129#comment-15237129
 ] 

Jason Lowe commented on YARN-2567:
--

The problem with delaying or otherwise making the state store operations 
asynchronous with the state changes they are intended to record is it will 
always lead to inconsistent recovery if we fail between the state change and 
the state store operation.  IMHO we cannot let the NM registration complete or 
at least start using the node in a way that is inconsistent with the state as 
currently recorded in the state store until the state store operation 
completes.  So we might be able to let the node register, but we should not 
allocate and launch new containers on it until the state store update completes 
or we end up with the problem described above.

In general there needs to be a minimal performance expectation from the state 
store for a given cluster setup or the RM is going to do some bad things.  For 
example, we can't sustain a situation where applications are being submitted at 
a rate faster than we can record them to the state store.  Similarly for large 
clusters it's going to be problematic if a large network cut occurs and we need 
to record the expiration of 1000's of containers but can't do so in a 
reasonable timeframe.  If we tell applications that containers on those nodes 
are lost _before_ we record the lost node in the state store then if we 
failover before the node re-joins the new RM instance won't know it's supposed 
to kill the containers on the rejoining node.  AMs probably won't appreciate 
being told a container has completed only to have it keep running and count 
against their user limits/headroom in the future.  Therefore we have to record 
the node as lost in the state store before we inform AMs the containers and 
node are gone.


> Add a percentage-node threshold for RM to wait for new allocations after 
> restart/failover
> -
>
> Key: YARN-2567
> URL: https://issues.apache.org/jira/browse/YARN-2567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> This is the remaining part of YARN-2001 - to halt allocations after restart 
> till x% of nodes sync back with the RM. This is useful for avoiding bad 
> scheduling during the time the nodes are still joining back after a 
> restart/failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237127#comment-15237127
 ] 

Naganarasimha G R commented on YARN-4909:
-

Hi [~vvasudev],
We too had discussions here on the similar lines:
bq. Don't undo the JerseyTest options to set a port. If a user has provided a 
port via the system properties, we should honor it.
If we see the current implementation of {{JerseyTestBase.initializeJerseyPort}} 
we are simply overriding "jerseyPort" by 
{{System.setProperty("jersey.test.port", Integer.toString(jerseyPort));}} so 
here too if user has provided the port it gets over ridden by the code. Also 
would there be possibility for this property to be set by user ? if so then we 
can do it in this way : overriden {{getport}} can check whether 
*"jersey.test.port"* is set then use that system property configured port as 
argument for {{ServerSocketUtil.getPort}} else use {{port}}.
bq. I'm not convinced the current patch will fix the issue - it'll probably 
make the occurrences less frequent.
Initially i felt the same but *ServerSocketUtil.getPort"* has been designed to 
work in that way itself, IIRC Solution was discussed as per this 
[comment|https://issues.apache.org/jira/browse/YARN-3528?focusedCommentId=14564091=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564091]
 and also they wanted to respect allocating to the port which was initially 
given. As per the test results, test failure propability is less And also what 
ever approach we take there would be slight possibility that the ports can 
overlap right ?

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4932) (Umbrella) YARN/MR test failures on Windows

2016-04-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4932:
-
Summary: (Umbrella) YARN/MR test failures on Windows  (was: (Umbrella) YARN 
test failures on Windows)

> (Umbrella) YARN/MR test failures on Windows
> ---
>
> Key: YARN-4932
> URL: https://issues.apache.org/jira/browse/YARN-4932
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Junping Du
>
> We found several test failures related to Windows. Here is Umbrella jira to 
> track them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237102#comment-15237102
 ] 

Sunil G commented on YARN-4909:
---

I agree to [~vvasudev].  Current fix will make this issue less frequent. But 
not solve 100%. I think there is some recent test case went in which s not 
closing port,  need to dig in. However the new suggested approach looks fine. 
We can try by adding port +/- 1and can try. 

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237083#comment-15237083
 ] 

Varun Vasudev commented on YARN-4909:
-

Couple of points about the patch -
# Don't undo the JerseyTest options to set a port. If a user has provided a 
port via the system properties, we should honor it.
# I'm not convinced the current patch will fix the issue - it'll probably make 
the occurrences less frequent. The problem is that the point at which we pick 
the port and the point at which we bind are different. So two tests could both 
start up, check that 9998 is free and there's a race to bind to it first. What 
you could do is add/subtract a small random number from 9998 so that two 
containers that start up together don't go for the same port. What do you think?

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237071#comment-15237071
 ] 

Hadoop QA commented on YARN-4940:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 62 unchanged - 0 fixed = 63 total (was 62) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 56s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 135m 21s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.webapp.TestRMWithCSRFFilter |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237043#comment-15237043
 ] 

Bibin A Chundatt commented on YARN-4909:


yes, we are running in parallel.
{noformat}
mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-0 -Ptest-patch 
-Pparallel-tests -P!shelltest -Pnative -Drequire.libwebhdfs -Drequire.snappy 
-Drequire.openssl -Drequire.fuse -Drequire.test.libhadoop clean test -fae > 
/testptch/hadoop/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt
 2
{noformat}

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237040#comment-15237040
 ] 

Bibin A Chundatt commented on YARN-4947:


IIUC {{MockRM#drainEvents()}} will loop infinite .{{GenericEventHandler}} will 
add event but never drained and Dispatcher is never started .{{isDrained}} will 
return false always.
{noformat}
  public void await() {
while (!isDrained()) {
  Thread.yield();
}
  }
{noformat}

Attaching patch to fix the same

> Test timeout is happening for TestRMWebServicesNodes
> 
>
> Key: YARN-4947
> URL: https://issues.apache.org/jira/browse/YARN-4947
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4947.patch
>
>
> Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 
> [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237038#comment-15237038
 ] 

Varun Vasudev commented on YARN-4909:
-

TestRMWebServices has been in the code for a long time. What's started these 
failures? Are we running tests in parallel?

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237032#comment-15237032
 ] 

Hadoop QA commented on YARN-4948:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 34s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 14 new + 
212 unchanged - 0 fixed = 226 total (was 212) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 14 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 12s 
{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 19s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 16s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does 

[jira] [Updated] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes

2016-04-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4947:
---
Attachment: 0001-YARN-4947.patch

> Test timeout is happening for TestRMWebServicesNodes
> 
>
> Key: YARN-4947
> URL: https://issues.apache.org/jira/browse/YARN-4947
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4947.patch
>
>
> Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 
> [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2016-04-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237024#comment-15237024
 ] 

Varun Saxena commented on YARN-3816:


Had a quick scan of the patch. There seems to be multiple aggregation 
operations. If we are appending it to a column qualifier and with 4 aggregation 
operations, we would need to create 4 single column value filters for a single 
metric i.e. if metric filter says metric1 > 40, we will have to create filter 
list like
metric1=SUM > 40 OR metric1=AVG > 40 OR metric1=NOOP > 40 and so on.

Will these aggregation operations be required by Offline aggregation(YARN-3817) 
?
If yes, can there be some other mechanism to indicate these aggregation 
operations instead of appending it in the column qualifier ?
Configuring it in some way, was a suggestion given earlier.

cc [~sjlee0]

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237023#comment-15237023
 ] 

Hadoop QA commented on YARN-4948:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} docker {color} | {color:blue} 0m 5s 
{color} | {color:blue} Dockerfile 
'/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/dev-support/docker/Dockerfile'
 not found, falling back to built-in. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 11m 11s 
{color} | {color:red} Docker failed to build yetus/hadoop:date2016-04-12. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798246/YARN-4948-branch-2.7.0.001.patch
 |
| JIRA Issue | YARN-4948 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11048/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237021#comment-15237021
 ] 

Naganarasimha G R commented on YARN-4948:
-

Oops seems like you almost updated the patch @ the same time i changed the 
status but anyway as the patch is not getting applied to trunk rebase the patch 
and then change the status, but also IMO i would suggest wait till the 
interface is developed as part of  YARN-4231.
[~wangda] can you assign this jira to [~wjlei] and add me to the committers 
list ?

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4168) Test TestLogAggregationService.testLocalFileDeletionOnDiskFull failing

2016-04-12 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237020#comment-15237020
 ] 

Takashi Ohnishi commented on YARN-4168:
---

Thank you [~vinodkv] for reviewing and committing! :)


> Test TestLogAggregationService.testLocalFileDeletionOnDiskFull failing
> --
>
> Key: YARN-4168
> URL: https://issues.apache.org/jira/browse/YARN-4168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Assignee: Takashi Ohnishi
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4168.1.patch, YARN-4168.2.patch, YARN-4168.3.patch
>
>
> {{TestLogAggregationService.testLocalFileDeletionOnDiskFull}} failing on 
> [Jenkins build 
> 1136|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk/1136/testReport/junit/org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation/TestLogAggregationService/testLocalFileDeletionOnDiskFull/]
> {code}
> {noformat}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:229)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:285)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4948:
--
Attachment: YARN-4948-branch-2.7.0.001.patch

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237007#comment-15237007
 ] 

sandflee commented on YARN-4940:


rather than converting UnknownNodeId , using NodeId seems more simple and 
reasonable. cc [~jlowe] [~kshukla]

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> 

[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4948:
--
Attachment: YARN-4948.001.patch

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: YARN-4948.001.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237005#comment-15237005
 ] 

Bibin A Chundatt commented on YARN-4909:


[~Naganarasimha]
Thank you for looking into patch .
The testcase failures are already tracked as part of umbrella YARN-4478

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4948:
--
Attachment: (was: Node-labels-store-in-zookeeper.patch)

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4940:
---
Attachment: YARN-4940.02.patch

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch, YARN-4940.02.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> 

[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237000#comment-15237000
 ] 

Naganarasimha G R commented on YARN-4909:
-

Hi [~bibinchundatt],
I am fine with the approach taken, if we dont override *getport* we need have 
two methods for @Before and @BeforeClass and anyway *JerseyTest.getport* is 
only taking from the system property and validating whether its configured 
correctly. And all other test case failures which are reported in the latest 
report is not related to patch, [~bibinchundatt] please confirm the same.
[~wangda], if you are fine with the approach i will go ahead and commit.

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4948:
--
Attachment: (was: Node-labels-store-in-zookeeper-2.7.0.patch)

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: Node-labels-store-in-zookeeper.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4948:
--
Attachment: Node-labels-store-in-zookeeper.patch

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: Node-labels-store-in-zookeeper.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236949#comment-15236949
 ] 

Hadoop QA commented on YARN-4948:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4948 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12798236/Node-labels-store-in-zookeeper-2.7.0.patch
 |
| JIRA Issue | YARN-4948 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/11045/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: Node-labels-store-in-zookeeper-2.7.0.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-4948:
--
Attachment: Node-labels-store-in-zookeeper-2.7.0.patch

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
> Attachments: Node-labels-store-in-zookeeper-2.7.0.patch
>
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node

2016-04-12 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4940:
---
Attachment: YARN-4940.01.patch

> yarn node -list -all failed if RM start with decommissioned node
> 
>
> Key: YARN-4940
> URL: https://issues.apache.org/jira/browse/YARN-4940
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4940.01.patch
>
>
> 1,   add a node to exclude file
> 2,   start RM
> 3,   run yarn  node -list -all , see the following exception
> {quote}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
>   at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> 

[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236934#comment-15236934
 ] 

Naganarasimha G R commented on YARN-4948:
-

[~wangda], if fine then we can move this jira as subjira of YARN-2492

> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236931#comment-15236931
 ] 

Naganarasimha G R commented on YARN-4948:
-

Hi [~wjlei], i think this is very much required, and related to YARN-4881 and 
also are you having a patch ready for this ? as patch is not attached i am 
changing the status, if you want to work on it i would suggest to wait for 
YARN-4231 which exposes a interface and then base your implementation on it. 
Please inform so that i can assign it to you.
cc/ [~wangda] thoughts ?


> Support node labels store in zookeeper
> --
>
> Key: YARN-4948
> URL: https://issues.apache.org/jira/browse/YARN-4948
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: jialei weng
>
> Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236918#comment-15236918
 ] 

Bibin A Chundatt commented on YARN-4909:


Added YARN-4947 to track 
{{org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes}} 
timeout

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4948) Support node labels store in zookeeper

2016-04-12 Thread jialei weng (JIRA)
jialei weng created YARN-4948:
-

 Summary: Support node labels store in zookeeper
 Key: YARN-4948
 URL: https://issues.apache.org/jira/browse/YARN-4948
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: jialei weng


Support node labels store in zookeeper



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes

2016-04-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4947:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> Test timeout is happening for TestRMWebServicesNodes
> 
>
> Key: YARN-4947
> URL: https://issues.apache.org/jira/browse/YARN-4947
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 
> [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4893) Fix some intermittent test failures in TestRMAdminService

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236909#comment-15236909
 ] 

Bibin A Chundatt commented on YARN-4893:


TestTimeout is happening for TestRMWebServicesNodes , added YARN-4947 to track 
the same

> Fix some intermittent test failures in TestRMAdminService
> -
>
> Key: YARN-4893
> URL: https://issues.apache.org/jira/browse/YARN-4893
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-4893-002.patch, YARN-4893-003.patch, YARN-4893.patch
>
>
> As discussion in YARN-998, we need to add rm.drainEvents() after 
> rm.registerNode() or some of test could get failed intermittently. Also, we 
> can consider to add rm.drainEvents() within rm.registerNode() that could be 
> more convenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes

2016-04-12 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4947:
--

 Summary: Test timeout is happening for TestRMWebServicesNodes
 Key: YARN-4947
 URL: https://issues.apache.org/jira/browse/YARN-4947
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 

[timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2016-04-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236902#comment-15236902
 ] 

Naganarasimha G R commented on YARN-3971:
-

Thanks for the latest patch [~bibinchundatt], patch LGTM. 
[~wangda] if you are ok with the approach taken then will go ahead and commit 
the addendum patch.

> Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
> recovery
> --
>
> Key: YARN-3971
> URL: https://issues.apache.org/jira/browse/YARN-3971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
> 0003-YARN-3971.patch, 0004-YARN-3971.patch, 
> 0005-YARN-3971.001.addendum.patch, 0005-YARN-3971.addendum.patch, 
> 0005-YARN-3971.patch
>
>
> Steps to reproduce 
> # Create label x,y
> # Delete label x,y
> # Create label x,y add capacity scheduler xml for labels x and y too
> # Restart RM 
>  
> Both RM will become Standby.
> Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
> {code}
> 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
> queue=a1 is using this label. Please remove label on queue before remove the 
> label
> java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
> label. Please remove label on queue before remove the label
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-4909:
--

Assignee: Bibin A Chundatt

> Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
> ---
>
> Key: YARN-4909
> URL: https://issues.apache.org/jira/browse/YARN-4909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Brahma Reddy Battula
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, 
> 0003-YARN-4909.patch, 0004-YARN-4909.patch
>
>
>  *Precommit link* 
> https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/
> *Trace* 
> {noformat}
> com.sun.jersey.test.framework.spi.container.TestContainerException: 
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
>   at 
> org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
>   at 
> org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
>   at 
> org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
>   at 
> com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
>   at 
> com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
>   at 
> com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
>   at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
>   at 
> org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4897) dataTables_wrapper change min height

2016-04-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236801#comment-15236801
 ] 

Bibin A Chundatt commented on YARN-4897:


[~rohithsharma]
Thank you for review and commit

> dataTables_wrapper change min height
> 
>
> Key: YARN-4897
> URL: https://issues.apache.org/jira/browse/YARN-4897
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4897.patch, Border and DefaultHeight.png
>
>
> Incase of dataTables_wrapper  the min height is 302 , Need to set the same to 
> 10px.
> For pages containing 2 tables causes layout problem when min_height=302
> When dataTables_wrapper  is in DIV rendering with border at min_height 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter

2016-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236778#comment-15236778
 ] 

Hadoop QA commented on YARN-4909:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 58s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 38s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 53s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 195m 47s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| 

[jira] [Commented] (YARN-2567) Add a percentage-node threshold for RM to wait for new allocations after restart/failover

2016-04-12 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236724#comment-15236724
 ] 

sandflee commented on YARN-2567:


there maybe one problem that if NM recovered as a finished state and NM 
register with running containers, normally we should kill the container. There 
may some problem as below:
1, NM LOST and RM store  LOST status successfully
2, RM failover and NM recovered as LOST
3, NM register and becomes RUNNING, {color:red} but RM stores RUNNING state 
failed or delayed{color}
4, RM allocate container on NM, and container running on it
5, RM failover and NM recovered as LOST
6, NM register with RM,  RM killed the container on it, this is not expected

to fix this , one solution is to store NM status first, then NM becomes 
RUNNING,  but this may delay the NM register for big cluster

> Add a percentage-node threshold for RM to wait for new allocations after 
> restart/failover
> -
>
> Key: YARN-2567
> URL: https://issues.apache.org/jira/browse/YARN-2567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> This is the remaining part of YARN-2001 - to halt allocations after restart 
> till x% of nodes sync back with the RM. This is useful for avoiding bad 
> scheduling during the time the nodes are still joining back after a 
> restart/failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.

2016-04-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved YARN-3639.

Resolution: Duplicate

> It takes too long time for RM to recover all apps if the original active RM 
> and NN go down at the same time.
> 
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Xianyin Xin
> Attachments: YARN-3639-recovery_log_1_app.txt
>
>
> If the active RM and NN go down at the same time, the new RM will take long 
> time to recover all apps. After analysis, we found the root cause is renewing 
> HDFS tokens in the recovering process. The HDFS client created by the renewer 
> would firstly try to connect to the original NN, the result of which is 
> time-out after 10~20s, and then the client tries to connect to the new NN. 
> The entire recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.

2016-04-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved YARN-3639.

Resolution: Fixed

> It takes too long time for RM to recover all apps if the original active RM 
> and NN go down at the same time.
> 
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Xianyin Xin
> Attachments: YARN-3639-recovery_log_1_app.txt
>
>
> If the active RM and NN go down at the same time, the new RM will take long 
> time to recover all apps. After analysis, we found the root cause is renewing 
> HDFS tokens in the recovering process. The HDFS client created by the renewer 
> would firstly try to connect to the original NN, the result of which is 
> time-out after 10~20s, and then the client tries to connect to the new NN. 
> The entire recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >