date:20150527


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562380#comment-14562380
 ] 

Varun Saxena commented on YARN-3051:


Thanks for the replies.




> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562367#comment-14562367
 ] 

Sangjin Lee commented on YARN-3721:
---

See 
https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14514872&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14514872

:)

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562353#comment-14562353
 ] 

Zhijie Shen commented on YARN-3721:
---

+1 for the approach.

BTW, there always exists a risk: HBase is the downstream of Hadoop, and compile 
against certain version of Hadoop. Say we use HBase X.X.X, and it depends on 
Hadoop Y.Y.Y. However, on trunk/branch-2, if we have made some incompatible 
change  to mini dfs cluster for a later release Y.Y.Z, our test cases are very 
likely to be broken, because HBase test utils compiled against Y.Y.Y is not 
longer compatible with Y.Y.Z runtime libs.

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3731) Unknown container. Container either has not started or has already completed or doesn’t belong to this node at all.

2015-05-27 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562334#comment-14562334
 ] 

Devaraj K commented on YARN-3731:
-

[~amitmsbi], Can you check the application/Job logs for the failed application, 
you can probably find the reason there.

> Unknown container. Container either has not started or has already completed 
> or doesn’t belong to this node at all. 
> 
>
> Key: YARN-3731
> URL: https://issues.apache.org/jira/browse/YARN-3731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: amit
>Priority: Critical
>
> Hi 
> I am importing data from sql server to hdfs and below is the command
> sqoop import –connect 
> “jdbc:sqlserver://Servername:1433;username=hadoop;password=Password;database=MSBI”
>  –table DimDate –target-dir /Hadoop/hdpdatadn/dn/DW/msbi
> but I am getting following error:
> User: amit.tomar
>  Name: DimDate.jar
>  Application Type: MAPREDUCE
>  Application Tags:
>  State: FAILED
>  FinalStatus: FAILED
>  Started: Wed May 27 12:39:48 +0800 2015
>  Elapsed: 23sec
>  Tracking URL: History
>  Diagnostics: Application application_1432698911303_0005 failed 2 times due 
> to AM Container for appattempt_1432698911303_0005_02 exited with 
> exitCode: 1
>  For more detailed output, check application tracking 
> page:http://ServerName/proxy/application_1432698911303_0005/Then, click on 
> links to logs of each attempt.
>  Diagnostics: Exception from container-launch.
>  Container id: container_1432698911303_0005_02_01
>  Exit code: 1
>  Stack trace: ExitCodeException exitCode=1:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>  at org.apache.hadoop.util.Shell.run(Shell.java:455)
>  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  Shell output: 1 file(s) moved.
>  Container exited with a non-zero exit code 1
>  Failing this attempt. Failing the application. 
> From the log below is the message:
> java.lang.Exception: Unknown container. Container either has not started or 
> has already completed or doesn’t belong to this node at all. 
> Thanks in advance
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty


[ 
https://issues.apache.org/jira/browse/YARN-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562314#comment-14562314
 ] 

Hadoop QA commented on YARN-3725:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 18s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m  3s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  42m 50s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735786/YARN-3725.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5450413 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8110/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8110/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8110/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8110/console |


This message was automatically generated.

> App submission via REST API is broken in secure mode due to Timeline DT 
> service address is empty
> 
>
> Key: YARN-3725
> URL: https://issues.apache.org/jira/browse/YARN-3725
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-3725.1.patch
>
>
> YARN-2971 changes TimelineClient to use the service address from Timeline DT 
> to renew the DT instead of configured address. This break the procedure of 
> submitting an YARN app via REST API in the secure mode.
> The problem is that service address is set by the client instead of the 
> server in Java code. REST API response is an encode token Sting, such that 
> it's so inconvenient to deserialize it and set the service address and 
> serialize it again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once


[ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562290#comment-14562290
 ] 

Hadoop QA commented on YARN-3489:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735791/YARN-3489-branch-2.7.04.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5450413 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8111/console |


This message was automatically generated.

> RMServerUtils.validateResourceRequests should only obtain queue info once
> -
>
> Key: YARN-3489
> URL: https://issues.apache.org/jira/browse/YARN-3489
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>  Labels: BB2015-05-RFC
> Attachments: YARN-3489-branch-2.7.02.patch, 
> YARN-3489-branch-2.7.03.patch, YARN-3489-branch-2.7.04.patch, 
> YARN-3489-branch-2.7.patch, YARN-3489.01.patch, YARN-3489.02.patch, 
> YARN-3489.03.patch
>
>
> Since the label support was added we now get the queue info for each request 
> being validated in SchedulerUtils.validateResourceRequest.  If 
> validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
> large cluster with lots of varied locality in the requests) then it will get 
> the queue info for each request.  Since we build the queue info this 
> generates a lot of unnecessary garbage, as the queue isn't changing between 
> requests.  We should grab the queue info once and pass it down rather than 
> building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.


[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562287#comment-14562287
 ] 

Hadoop QA commented on YARN-3727:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 18s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  42m 16s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735768/YARN-3727.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5450413 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8109/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8109/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8109/console |


This message was automatically generated.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3690) 'mvn site' fails on JDK8


[ 
https://issues.apache.org/jira/browse/YARN-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562283#comment-14562283
 ] 

Hadoop QA commented on YARN-3690:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 28s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| | |  38m  6s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735598/YARN-3690-patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5450413 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8108/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8108/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8108/console |


This message was automatically generated.

> 'mvn site' fails on JDK8
> 
>
> Key: YARN-3690
> URL: https://issues.apache.org/jira/browse/YARN-3690
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, site
> Environment: CentOS 7.0, Oracle JDK 8u45.
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3690-patch
>
>
> 'mvn site' failed by the following error:
> {noformat}
> [ERROR] 
> /home/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/package-info.java:18:
>  error: package org.apache.hadoop.yarn.factories has already been annotated
> [ERROR] @InterfaceAudience.LimitedPrivate({ "MapReduce", "YARN" })
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR] at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161)
> [ERROR] at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR] at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR] at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR] at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error
> [ERROR] 
> [ERROR] Command line was: /usr/java/jdk1.8.0_45/jre/../bin/javadoc 
> -J-

[jira] [Updated] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once


 [ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3489:
---
Attachment: YARN-3489-branch-2.7.04.patch

> RMServerUtils.validateResourceRequests should only obtain queue info once
> -
>
> Key: YARN-3489
> URL: https://issues.apache.org/jira/browse/YARN-3489
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>  Labels: BB2015-05-RFC
> Attachments: YARN-3489-branch-2.7.02.patch, 
> YARN-3489-branch-2.7.03.patch, YARN-3489-branch-2.7.04.patch, 
> YARN-3489-branch-2.7.patch, YARN-3489.01.patch, YARN-3489.02.patch, 
> YARN-3489.03.patch
>
>
> Since the label support was added we now get the queue info for each request 
> being validated in SchedulerUtils.validateResourceRequest.  If 
> validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
> large cluster with lots of varied locality in the requests) then it will get 
> the queue info for each request.  Since we build the queue info this 
> generates a lot of unnecessary garbage, as the queue isn't changing between 
> requests.  We should grab the queue info once and pass it down rather than 
> building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3731) Unknown container. Container either has not started or has already completed or doesn’t belong to this node at all.

2015-05-27 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562280#comment-14562280
 ] 

Rohith commented on YARN-3731:
--

Closing the issue as invalid.

> Unknown container. Container either has not started or has already completed 
> or doesn’t belong to this node at all. 
> 
>
> Key: YARN-3731
> URL: https://issues.apache.org/jira/browse/YARN-3731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: amit
>Priority: Critical
>
> Hi 
> I am importing data from sql server to hdfs and below is the command
> sqoop import –connect 
> “jdbc:sqlserver://Servername:1433;username=hadoop;password=Password;database=MSBI”
>  –table DimDate –target-dir /Hadoop/hdpdatadn/dn/DW/msbi
> but I am getting following error:
> User: amit.tomar
>  Name: DimDate.jar
>  Application Type: MAPREDUCE
>  Application Tags:
>  State: FAILED
>  FinalStatus: FAILED
>  Started: Wed May 27 12:39:48 +0800 2015
>  Elapsed: 23sec
>  Tracking URL: History
>  Diagnostics: Application application_1432698911303_0005 failed 2 times due 
> to AM Container for appattempt_1432698911303_0005_02 exited with 
> exitCode: 1
>  For more detailed output, check application tracking 
> page:http://ServerName/proxy/application_1432698911303_0005/Then, click on 
> links to logs of each attempt.
>  Diagnostics: Exception from container-launch.
>  Container id: container_1432698911303_0005_02_01
>  Exit code: 1
>  Stack trace: ExitCodeException exitCode=1:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>  at org.apache.hadoop.util.Shell.run(Shell.java:455)
>  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  Shell output: 1 file(s) moved.
>  Container exited with a non-zero exit code 1
>  Failing this attempt. Failing the application. 
> From the log below is the message:
> java.lang.Exception: Unknown container. Container either has not started or 
> has already completed or doesn’t belong to this node at all. 
> Thanks in advance
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3731) Unknown container. Container either has not started or has already completed or doesn’t belong to this node at all.

2015-05-27 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith resolved YARN-3731.
--
Resolution: Invalid

> Unknown container. Container either has not started or has already completed 
> or doesn’t belong to this node at all. 
> 
>
> Key: YARN-3731
> URL: https://issues.apache.org/jira/browse/YARN-3731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: amit
>Priority: Critical
>
> Hi 
> I am importing data from sql server to hdfs and below is the command
> sqoop import –connect 
> “jdbc:sqlserver://Servername:1433;username=hadoop;password=Password;database=MSBI”
>  –table DimDate –target-dir /Hadoop/hdpdatadn/dn/DW/msbi
> but I am getting following error:
> User: amit.tomar
>  Name: DimDate.jar
>  Application Type: MAPREDUCE
>  Application Tags:
>  State: FAILED
>  FinalStatus: FAILED
>  Started: Wed May 27 12:39:48 +0800 2015
>  Elapsed: 23sec
>  Tracking URL: History
>  Diagnostics: Application application_1432698911303_0005 failed 2 times due 
> to AM Container for appattempt_1432698911303_0005_02 exited with 
> exitCode: 1
>  For more detailed output, check application tracking 
> page:http://ServerName/proxy/application_1432698911303_0005/Then, click on 
> links to logs of each attempt.
>  Diagnostics: Exception from container-launch.
>  Container id: container_1432698911303_0005_02_01
>  Exit code: 1
>  Stack trace: ExitCodeException exitCode=1:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>  at org.apache.hadoop.util.Shell.run(Shell.java:455)
>  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  Shell output: 1 file(s) moved.
>  Container exited with a non-zero exit code 1
>  Failing this attempt. Failing the application. 
> From the log below is the message:
> java.lang.Exception: Unknown container. Container either has not started or 
> has already completed or doesn’t belong to this node at all. 
> Thanks in advance
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3731) Unknown container. Container either has not started or has already completed or doesn’t belong to this node at all.

2015-05-27 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562279#comment-14562279
 ] 

Rohith commented on YARN-3731:
--

Hi [~amitmsbi]
Thanks for using Hadoop. You are trying to access the log link where the 
application master itself never launched. From the diagnosis message, it is 
clear that application is not launched. So first and formost, you need to check 
the application mater that why it is not launched. There would be some 
application configuration or classpath issue which you can get it from syserr 
container logs.

And JIRA is meant for tracking development activities.For queries kinldy 
register to [mailing list|https://hadoop.apache.org/mailing_lists.html] and 
send mail to users mailing list i.e {{u...@hadoop.apache.org}}. Definitely 
folks will help you to solve or answer your queries.

> Unknown container. Container either has not started or has already completed 
> or doesn’t belong to this node at all. 
> 
>
> Key: YARN-3731
> URL: https://issues.apache.org/jira/browse/YARN-3731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: amit
>Priority: Critical
>
> Hi 
> I am importing data from sql server to hdfs and below is the command
> sqoop import –connect 
> “jdbc:sqlserver://Servername:1433;username=hadoop;password=Password;database=MSBI”
>  –table DimDate –target-dir /Hadoop/hdpdatadn/dn/DW/msbi
> but I am getting following error:
> User: amit.tomar
>  Name: DimDate.jar
>  Application Type: MAPREDUCE
>  Application Tags:
>  State: FAILED
>  FinalStatus: FAILED
>  Started: Wed May 27 12:39:48 +0800 2015
>  Elapsed: 23sec
>  Tracking URL: History
>  Diagnostics: Application application_1432698911303_0005 failed 2 times due 
> to AM Container for appattempt_1432698911303_0005_02 exited with 
> exitCode: 1
>  For more detailed output, check application tracking 
> page:http://ServerName/proxy/application_1432698911303_0005/Then, click on 
> links to logs of each attempt.
>  Diagnostics: Exception from container-launch.
>  Container id: container_1432698911303_0005_02_01
>  Exit code: 1
>  Stack trace: ExitCodeException exitCode=1:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>  at org.apache.hadoop.util.Shell.run(Shell.java:455)
>  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  Shell output: 1 file(s) moved.
>  Container exited with a non-zero exit code 1
>  Failing this attempt. Failing the application. 
> From the log below is the message:
> java.lang.Exception: Unknown container. Container either has not started or 
> has already completed or doesn’t belong to this node at all. 
> Thanks in advance
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562253#comment-14562253
 ] 

Hadoop QA commented on YARN-3044:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 16s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 50s | The applied patch generated  1 
new checkstyle issues (total was 245, now 245). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 13s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  61m 15s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:red}-1{color} | yarn tests |   1m 12s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | | 105m 43s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At 
AbstractTimelineServicePublisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At AbstractTimelineServicePublisher.java:[line 79] |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At 
AbstractTimelineServicePublisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At AbstractTimelineServicePublisher.java:[line 76] |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At 
AbstractTimelineServicePublisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At AbstractTimelineServicePublisher.java:[line 73] |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At 
AbstractTimelineServicePublisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At AbstractTimelineServicePublisher.java:[line 67] |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)
  At 
AbstractTimelineServicePublisher.java:org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)

[jira] [Updated] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty


 [ 
https://issues.apache.org/jira/browse/YARN-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3725:
--
Attachment: YARN-3725.1.patch

Put the patch of a short term fix for the regression on 2.7.

> App submission via REST API is broken in secure mode due to Timeline DT 
> service address is empty
> 
>
> Key: YARN-3725
> URL: https://issues.apache.org/jira/browse/YARN-3725
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-3725.1.patch
>
>
> YARN-2971 changes TimelineClient to use the service address from Timeline DT 
> to renew the DT instead of configured address. This break the procedure of 
> submitting an YARN app via REST API in the secure mode.
> The problem is that service address is set by the client instead of the 
> server in Java code. REST API response is an encode token Sting, such that 
> it's so inconvenient to deserialize it and set the service address and 
> serialize it again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562234#comment-14562234
 ] 

Sangjin Lee commented on YARN-3721:
---

With that explanation, I'm +1 on the patch. What do others think? If everyone 
is OK with it, I can commit this patch also.

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3731) Unknown container. Container either has not started or has already completed or doesn’t belong to this node at all.

2015-05-27 Thread amit (JIRA)

amit created YARN-3731:
--

 Summary: Unknown container. Container either has not started or 
has already completed or doesn’t belong to this node at all. 
 Key: YARN-3731
 URL: https://issues.apache.org/jira/browse/YARN-3731
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: amit
Priority: Critical




Hi 

I am importing data from sql server to hdfs and below is the command

sqoop import –connect 
“jdbc:sqlserver://Servername:1433;username=hadoop;password=Password;database=MSBI”
 –table DimDate –target-dir /Hadoop/hdpdatadn/dn/DW/msbi

but I am getting following error:

User: amit.tomar
 Name: DimDate.jar
 Application Type: MAPREDUCE
 Application Tags:
 State: FAILED
 FinalStatus: FAILED
 Started: Wed May 27 12:39:48 +0800 2015
 Elapsed: 23sec
 Tracking URL: History
 Diagnostics: Application application_1432698911303_0005 failed 2 times due to 
AM Container for appattempt_1432698911303_0005_02 exited with exitCode: 1
 For more detailed output, check application tracking 
page:http://ServerName/proxy/application_1432698911303_0005/Then, click on 
links to logs of each attempt.
 Diagnostics: Exception from container-launch.
 Container id: container_1432698911303_0005_02_01
 Exit code: 1
 Stack trace: ExitCodeException exitCode=1:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
 at org.apache.hadoop.util.Shell.run(Shell.java:455)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
 at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Shell output: 1 file(s) moved.
 Container exited with a non-zero exit code 1
 Failing this attempt. Failing the application. 

>From the log below is the message:

java.lang.Exception: Unknown container. Container either has not started or has 
already completed or doesn’t belong to this node at all. 

Thanks in advance




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance

2015-05-27 Thread Xianyin Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562221#comment-14562221
 ] 

Xianyin Xin commented on YARN-3652:
---

Thanks for your comments, [~vvasudev]. 
I agree with you on the general idea of the relation between ScheduerMetrics 
and SchedulerHealth. However, SchedulerHealth now has only be implemented on 
CS, if we turn to use SchedulerHealth in YARN-3630, we need to wait 
SchedulerHealth support in Fair, and expose {{getSchedulerHealth}} in 
YarnScheduler. Do we have any plan on that?

> A SchedulerMetrics may be need for evaluating the scheduler's performance
> -
>
> Key: YARN-3652
> URL: https://issues.apache.org/jira/browse/YARN-3652
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Xianyin Xin
>
> As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating 
> the scheduler's performance. The performance indexes includes #events waiting 
> for being handled by scheduler, the throughput, the scheduling delay and/or 
> other indicators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.


[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562219#comment-14562219
 ] 

zhihai xu commented on YARN-3727:
-

I attached a patch YARN-3727.000.patch for review. The patch will check the 
directory for the localization before assigning it to {{LocalizedResource}}, if 
the directory already exists, delete it and move to next one. I did some 
profiling for this change. The performance overhead is very minor(less than 1 
ms),
because the existing directory case happens very rarely and {{file.exists()}} 
takes very little time to run.


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3730) scheduler reserve more resource than required

2015-05-27 Thread gu-chi (JIRA)

gu-chi created YARN-3730:


 Summary: scheduler reserve more resource than required
 Key: YARN-3730
 URL: https://issues.apache.org/jira/browse/YARN-3730
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: gu-chi


Using capacity scheduler, environment is 3 NM and each has 9 vcores, I ran a 
spark task with 4 executors and each executor 5 cores, as suspected, only 1 
executor not able to start and will be reserved, but actually more containers 
are reserved. This way, I can not run some other smaller tasks. As I checked 
the capacity scheduler, the 'needContainers' method in LeafQueue.java has a 
computation of 'starvation', this cause the scenario of more container reserved 
than required, any idea or suggestion on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-27 Thread Xianyin Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562211#comment-14562211
 ] 

Xianyin Xin commented on YARN-3630:
---

Thanks for your comments, [~vvasudev]!
{quote}
your patch doesn't check if the calculated interval is greater than the ping 
interval to determine liveliness for the AM and the NM. Is that by design?
{quote}
It's true we should do that. But in this patch I haven't add the mechanism of 
determining a up limit for the {{nextHeartbeatInterval}}. I think the limit 
should much less than the ping interval which is 10 minutes by default. Other 
hand, do you think a hard configurable limit is accepted? 
{quote}
With respect to adaptive heartbeats for the NMs - my concern is that the 
proposed solution will lead to behaviour where the NMs will be told to back off 
- the NMs will wait for sometime - the RM will receive a flood of NM updates - 
leading to the NMs being told to back off and so on and so forth. We'll end up 
in a situation where the pings will become clustered around particular time 
intervals, leading to container allocation and release delays. You might be 
better off picking a random interval between the default interval and the 
calculated interval to spread out the NM pings
{quote}
Thanks for reminding, it's a situation I didn't think much. I think your 
suggestion is a nice choice. 

> YARN should suggest a heartbeat interval for applications
> -
>
> Key: YARN-3630
> URL: https://issues.apache.org/jira/browse/YARN-3630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.0
>Reporter: Zoltán Zvara
>Assignee: Xianyin Xin
>Priority: Minor
> Attachments: Notes_for_adaptive_heartbeat_policy.pdf, 
> YARN-3630.001.patch.patch, YARN-3630.002.patch
>
>
> It seems currently applications - for example Spark - are not adaptive to RM 
> regarding heartbeat intervals. RM should be able to suggest a desired 
> heartbeat interval to applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3644) Node manager shuts down if unable to connect with RM


 [ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3644:
---
Assignee: Raju Bairishetti  (was: Jun Gong)

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Raju Bairishetti
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM


[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562210#comment-14562210
 ] 

Jun Gong commented on YARN-3644:


Sorry, by mistake...

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Jun Gong
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM


[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562209#comment-14562209
 ] 

Jun Gong commented on YARN-3644:


Sorry, by mistake...

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Jun Gong
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-27 Thread Raju Bairishetti (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562206#comment-14562206
 ] 

Raju Bairishetti commented on YARN-3644:


[~hex108] Is there any work pending on the jira to assign yourself or was it 
assigned yourself by mistake?


> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Jun Gong
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3729) Modify the yarn CLI to be able to read the ConcatenatableAggregatedLogFormat

Robert Kanter created YARN-3729:
---

 Summary: Modify the yarn CLI to be able to read the 
ConcatenatableAggregatedLogFormat
 Key: YARN-3729
 URL: https://issues.apache.org/jira/browse/YARN-3729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter


When serving logs, the {{yarn}} CLI needs to be able to read the 
ConcatenatableAggregatedLogFormat or the AggregatedLogFormat transparently.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3218) Implement ConcatenatableAggregatedLogFormat Reader and Writer


 [ 
https://issues.apache.org/jira/browse/YARN-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3218:

Description: We need to create a Reader and Writer for the 
{{ConcatenatableAggregatedLogFormat}}  (was: We need to create a Reader and 
Writer for the ConcatenatableAggregatedLogFormat)

> Implement ConcatenatableAggregatedLogFormat Reader and Writer
> -
>
> Key: YARN-3218
> URL: https://issues.apache.org/jira/browse/YARN-3218
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-3218.001.patch
>
>
> We need to create a Reader and Writer for the 
> {{ConcatenatableAggregatedLogFormat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3728) Add an rmadmin command to compact concatenated aggregated logs

Robert Kanter created YARN-3728:
---

 Summary: Add an rmadmin command to compact concatenated aggregated 
logs
 Key: YARN-3728
 URL: https://issues.apache.org/jira/browse/YARN-3728
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter


Create an {{rmadmin}} command to compact any concatenated aggregated log files 
it finds in the aggregated logs directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3220) Create a Service in the RM to concatenate aggregated logs


 [ 
https://issues.apache.org/jira/browse/YARN-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3220:

Description: Create an {{RMAggregatedLogsConcatenationService}} in the RM 
that will concatenate the aggregated log files written by the NM (which are in 
the new {{ConcatableAggregatedLogFormat}} format) when an application finishes. 
 (was: The JHS should read the Combined Aggregated Log files created by 
YARN-3219 when the user asks it for logs.  When unavailable, it should fallback 
to the regular Aggregated Log files (the current behavior).)
Summary: Create a Service in the RM to concatenate aggregated logs  
(was: JHS should display Combined Aggregated Logs when available)

> Create a Service in the RM to concatenate aggregated logs
> -
>
> Key: YARN-3220
> URL: https://issues.apache.org/jira/browse/YARN-3220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>
> Create an {{RMAggregatedLogsConcatenationService}} in the RM that will 
> concatenate the aggregated log files written by the NM (which are in the new 
> {{ConcatableAggregatedLogFormat}} format) when an application finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3219) Modify the NM to write logs using the ConcatenatableAggregatedLogFormat


 [ 
https://issues.apache.org/jira/browse/YARN-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3219:

Description: The NodeManager should use the 
{{ConcatenatableAggregatedLogFormat}} from YARN-3218 instead of the 
{{AggregatedLogFormat}} for writing aggregated log files to HDFS.  (was: The 
NodeManager should use the {{CombinedAggregatedLogFormat}} from YARN-3218 to 
append its aggregated log to the per-app log file.)
Summary: Modify the NM to write logs using the 
ConcatenatableAggregatedLogFormat  (was: Use CombinedAggregatedLogFormat Writer 
to combine aggregated log files)

> Modify the NM to write logs using the ConcatenatableAggregatedLogFormat
> ---
>
> Key: YARN-3219
> URL: https://issues.apache.org/jira/browse/YARN-3219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>
> The NodeManager should use the {{ConcatenatableAggregatedLogFormat}} from 
> YARN-3218 instead of the {{AggregatedLogFormat}} for writing aggregated log 
> files to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3218) Implement ConcatenatableAggregatedLogFormat Reader and Writer


 [ 
https://issues.apache.org/jira/browse/YARN-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3218:

Description: We need to create a Reader and Writer for the 
ConcatenatableAggregatedLogFormat  (was: We need to create a Reader and Writer 
for the CombinedAggregatedLogFormat)
Summary: Implement ConcatenatableAggregatedLogFormat Reader and Writer  
(was: Implement CombinedAggregatedLogFormat Reader and Writer)

> Implement ConcatenatableAggregatedLogFormat Reader and Writer
> -
>
> Key: YARN-3218
> URL: https://issues.apache.org/jira/browse/YARN-3218
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-3218.001.patch
>
>
> We need to create a Reader and Writer for the 
> ConcatenatableAggregatedLogFormat



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562180#comment-14562180
 ] 

gu-chi commented on YARN-3678:
--

I made this https://github.com/apache/hadoop/pull/20/

> DelayedProcessKiller may kill other process other than container
> 
>
> Key: YARN-3678
> URL: https://issues.apache.org/jira/browse/YARN-3678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: gu-chi
>Priority: Critical
>
> Suppose one container finished, then it will do clean up, the PID file still 
> exist and will trigger once singalContainer, this will kill the process with 
> the pid in PID file, but as container already finished, so this PID may be 
> occupied by other process, this may cause serious issue.
> As I know, my NM was killed unexpectedly, what I described can be the cause. 
> Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.


 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: YARN-3727.000.patch

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.


 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Description: 
For better error recovery, check if the directory exists before using it for 
localization.
We saw the following localization failure happened due to existing cache 
directories.
{code}
2015-05-11 18:59:59,756 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, null 
}, Rename cannot overwrite non empty destination directory 
//8/yarn/nm/usercache//filecache/21637
2015-05-11 18:59:59,756 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
 transitioned from DOWNLOADING to FAILED
{code}

The real cause for this failure may be disk failure, LevelDB operation failure 
for {{startResourceLocalization}}/{{finishResourceLocalization}} or others.

I wonder whether we can add error recovery code to avoid the localization 
failure by not using the existing cache directories for localization.

The exception happened at {{files.rename(dst_work, destDirPath, 
Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after the 
exception, the existing cache directory used by {{LocalizedResource}} will be 
deleted.
{code}
try {
 .
  files.rename(dst_work, destDirPath, Rename.OVERWRITE);
} catch (Exception e) {
  try {
files.delete(destDirPath, true);
  } catch (IOException ignore) {
  }
  throw e;
} finally {
{code}

Since the conflicting local directory will be deleted after localization 
failure,
I think it will be better to check if the directory exists before using it for 
localization to avoid the localization failure.

  was:
For better error recovery, check if the directory exists before using it for 
localization.
We saw the following localization failure happened due to existing cache 
directories.
{code}
2015-05-11 18:59:59,756 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, null 
}, Rename cannot overwrite non empty destination directory 
//8/yarn/nm/usercache//filecache/21637
2015-05-11 18:59:59,756 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
 transitioned from DOWNLOADING to FAILED
{code}

The real cause for this failure may be disk failure, LevelDB operation failure 
for {{startResourceLocalization}}/{{finishResourceLocalization}} or others.

I wonder whether we can add error recovery code to avoid the localization 
failure by not using the existing cache directories for localization.

The exception happened at {{files.rename(dst_work, destDirPath, 
Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after the 
exception, the existing cache directory used by {{LocalizedResource}} will be 
deleted.
{{code}}
try {
 .
  files.rename(dst_work, destDirPath, Rename.OVERWRITE);
} catch (Exception e) {
  try {
files.delete(destDirPath, true);
  } catch (IOException ignore) {
  }
  throw e;
} finally {
{{code}}

Since the conflicting local directory will be deleted after localization 
failure,
I think it will be better to check if the directory exists before using it for 
localization to avoid the localization failure.


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/2

[jira] [Created] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

zhihai xu created YARN-3727:
---

 Summary: For better error recovery, check if the directory exists 
before using it for localization.
 Key: YARN-3727
 URL: https://issues.apache.org/jira/browse/YARN-3727
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu


For better error recovery, check if the directory exists before using it for 
localization.
We saw the following localization failure happened due to existing cache 
directories.
{code}
2015-05-11 18:59:59,756 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, null 
}, Rename cannot overwrite non empty destination directory 
//8/yarn/nm/usercache//filecache/21637
2015-05-11 18:59:59,756 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
 transitioned from DOWNLOADING to FAILED
{code}

The real cause for this failure may be disk failure, LevelDB operation failure 
for {{startResourceLocalization}}/{{finishResourceLocalization}} or others.

I wonder whether we can add error recovery code to avoid the localization 
failure by not using the existing cache directories for localization.

The exception happened at {{files.rename(dst_work, destDirPath, 
Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after the 
exception, the existing cache directory used by {{LocalizedResource}} will be 
deleted.
{{code}}
try {
 .
  files.rename(dst_work, destDirPath, Rename.OVERWRITE);
} catch (Exception e) {
  try {
files.delete(destDirPath, true);
  } catch (IOException ignore) {
  }
  throw e;
} finally {
{{code}}

Since the conflicting local directory will be deleted after localization 
failure,
I think it will be better to check if the directory exists before using it for 
localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3726) Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data


[ 
https://issues.apache.org/jira/browse/YARN-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562165#comment-14562165
 ] 

Hadoop QA commented on YARN-3726:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m 25s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m 17s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 28s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 40s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 53s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 42s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 16s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  22m 32s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735742/YARN-3726-YARN-2928.001.patch
 |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | YARN-2928 / e19566a |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8106/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8106/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8106/console |


This message was automatically generated.

> Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data
> --
>
> Key: YARN-3726
> URL: https://issues.apache.org/jira/browse/YARN-3726
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3726-YARN-2928.001.patch
>
>
> There is a very fascinating  bug that was introduced by the test data in the 
> metrics time series check in the unit test in TestHBaseTimelineWriterImpl in 
> YARN-3411. 
> The unit test failure seen is 
> {code}
> Error Message
> expected:<1> but was:<6>
> Stacktrace
> java.lang.AssertionError: expected:<1> but was:<6>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.checkMetricsTimeseries(TestHBaseTimelineWriterImpl.java:219)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.testWriteEntityToHBase(TestHBaseTimelineWriterImpl.java:204)
> {code}
> The test data had 6 timestamps that belonged to 22nd April 2015. When the 
> patch in YARN-3411 was submitted and tested by Hadoop QA on May 19th, the 
> unit test was working fine. Fast forward a few more days and the test started 
> failing. There has been no relevant code change or package version change 
> interim. The change that is triggering the unit test failure is the passage 
> of time.
> The reason for test failure is that the metrics time series data lives in a 
> column family which has a TTL set to 30 days. Metrics time series data was 
> written to the mini hbase cluster with cell timestamps set to April 22nd. 
> Based on the column family configuration, hbase started deleting the data 
> that was older than 30 days and the test started failing. The last value is 
> retained, hence there is one value fetched from hbase. 
> Will submit a patch with the test case fixed shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3726) Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data


[ 
https://issues.apache.org/jira/browse/YARN-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562136#comment-14562136
 ] 

Sangjin Lee commented on YARN-3726:
---

LGTM too. Once the jenkins comes back green (and unless there is an objection), 
I'll commit the patch.

> Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data
> --
>
> Key: YARN-3726
> URL: https://issues.apache.org/jira/browse/YARN-3726
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3726-YARN-2928.001.patch
>
>
> There is a very fascinating  bug that was introduced by the test data in the 
> metrics time series check in the unit test in TestHBaseTimelineWriterImpl in 
> YARN-3411. 
> The unit test failure seen is 
> {code}
> Error Message
> expected:<1> but was:<6>
> Stacktrace
> java.lang.AssertionError: expected:<1> but was:<6>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.checkMetricsTimeseries(TestHBaseTimelineWriterImpl.java:219)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.testWriteEntityToHBase(TestHBaseTimelineWriterImpl.java:204)
> {code}
> The test data had 6 timestamps that belonged to 22nd April 2015. When the 
> patch in YARN-3411 was submitted and tested by Hadoop QA on May 19th, the 
> unit test was working fine. Fast forward a few more days and the test started 
> failing. There has been no relevant code change or package version change 
> interim. The change that is triggering the unit test failure is the passage 
> of time.
> The reason for test failure is that the metrics time series data lives in a 
> column family which has a TTL set to 30 days. Metrics time series data was 
> written to the mini hbase cluster with cell timestamps set to April 22nd. 
> Based on the column family configuration, hbase started deleting the data 
> that was older than 30 days and the test started failing. The last value is 
> retained, hence there is one value fetched from hbase. 
> Will submit a patch with the test case fixed shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3644) Node manager shuts down if unable to connect with RM


 [ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong reassigned YARN-3644:
--

Assignee: Jun Gong  (was: Raju Bairishetti)

> Node manager shuts down if unable to connect with RM
> 
>
> Key: YARN-3644
> URL: https://issues.apache.org/jira/browse/YARN-3644
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Srikanth Sundarrajan
>Assignee: Jun Gong
> Attachments: YARN-3644.001.patch, YARN-3644.patch
>
>
> When NM is unable to connect to RM, NM shuts itself down.
> {code}
>   } catch (ConnectException e) {
> //catch and throw the exception if tried MAX wait time to connect 
> RM
> dispatcher.getEventHandler().handle(
> new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
> throw new YarnRuntimeException(e);
> {code}
> In large clusters, if RM is down for maintenance for longer period, all the 
> NMs shuts themselves down, requiring additional work to bring up the NMs.
> Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
> effects, where non connection failures are being retried infinitely by all 
> YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-05-27 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3044:

Attachment: YARN-3044-YARN-2928.009.patch

Hi [~zjshen],
I have attached a patch with removing the creation of asyncdispatcher for V2


> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
> YARN-3044-YARN-2928.009.patch, YARN-3044.20150325-1.patch, 
> YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gu-chi updated YARN-3678:
-
Attachment: (was: YARN-3678.patch)

> DelayedProcessKiller may kill other process other than container
> 
>
> Key: YARN-3678
> URL: https://issues.apache.org/jira/browse/YARN-3678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: gu-chi
>Priority: Critical
>
> Suppose one container finished, then it will do clean up, the PID file still 
> exist and will trigger once singalContainer, this will kill the process with 
> the pid in PID file, but as container already finished, so this PID may be 
> occupied by other process, this may cause serious issue.
> As I know, my NM was killed unexpectedly, what I described can be the cause. 
> Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3712) ContainersLauncher: handle event CLEANUP_CONTAINER asynchronously


 [ 
https://issues.apache.org/jira/browse/YARN-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3712:
---
Priority: Minor  (was: Major)

> ContainersLauncher: handle event CLEANUP_CONTAINER asynchronously
> -
>
> Key: YARN-3712
> URL: https://issues.apache.org/jira/browse/YARN-3712
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Minor
> Attachments: YARN-3712.01.patch, YARN-3712.02.patch
>
>
> It will save some time by handling event CLEANUP_CONTAINER asynchronously. 
> This improvement will be useful for cases that cleaning up container cost a 
> little long time(e.g. for our case: we are running Docker container on NM, it 
> will take above 1 seconds to clean up one docker container.  ) and many 
> containers to clean up(e.g. NM need clean up all running containers when NM 
> shutdown). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3066) Hadoop leaves orphaned tasks running after job is killed

2015-05-27 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562114#comment-14562114
 ] 

Allen Wittenauer commented on YARN-3066:


Is Bugtraq+ (if that is what it is still called... haven't been a Sun employee 
for a while...)  still sealed off?

> Hadoop leaves orphaned tasks running after job is killed
> 
>
> Key: YARN-3066
> URL: https://issues.apache.org/jira/browse/YARN-3066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: Hadoop 2.4.1 (probably all later too), FreeBSD-10.1
>Reporter: Dmitry Sivachenko
>
> When spawning user task, node manager checks for setsid(1) utility and spawns 
> task program via it. See 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
>  for instance:
> String exec = Shell.isSetsidAvailable? "exec setsid" : "exec";
> FreeBSD, unlike Linux, does not have setsid(1) utility.  So plain "exec" is 
> used to spawn user task.  If that task spawns other external programs (this 
> is common case if a task program is a shell script) and user kills job via 
> mapred job -kill , these child processes remain running.
> 1) Why do you silently ignore the absence of setsid(1) and spawn task process 
> via exec: this is the guarantee to have orphaned processes when job is 
> prematurely killed.
> 2) FreeBSD has a replacement third-party program called ssid (which does 
> almost the same as Linux's setsid).  It would be nice to detect which binary 
> is present during configure stage and put @SETSID@ macros into java file to 
> use the correct name.
> I propose to make Shell.isSetsidAvailable test more strict and fail to start 
> if it is not found:  at least we will know about the problem at start rather 
> than guess why there are orphaned tasks running forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3726) Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data

2015-05-27 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562115#comment-14562115
 ] 

Joep Rottinghuis commented on YARN-3726:


+1 applying the patch resolves the unit test failure (in Eclipse) for me.

> Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data
> --
>
> Key: YARN-3726
> URL: https://issues.apache.org/jira/browse/YARN-3726
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3726-YARN-2928.001.patch
>
>
> There is a very fascinating  bug that was introduced by the test data in the 
> metrics time series check in the unit test in TestHBaseTimelineWriterImpl in 
> YARN-3411. 
> The unit test failure seen is 
> {code}
> Error Message
> expected:<1> but was:<6>
> Stacktrace
> java.lang.AssertionError: expected:<1> but was:<6>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.checkMetricsTimeseries(TestHBaseTimelineWriterImpl.java:219)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.testWriteEntityToHBase(TestHBaseTimelineWriterImpl.java:204)
> {code}
> The test data had 6 timestamps that belonged to 22nd April 2015. When the 
> patch in YARN-3411 was submitted and tested by Hadoop QA on May 19th, the 
> unit test was working fine. Fast forward a few more days and the test started 
> failing. There has been no relevant code change or package version change 
> interim. The change that is triggering the unit test failure is the passage 
> of time.
> The reason for test failure is that the metrics time series data lives in a 
> column family which has a TTL set to 30 days. Metrics time series data was 
> written to the mini hbase cluster with cell timestamps set to April 22nd. 
> Based on the column family configuration, hbase started deleting the data 
> that was older than 30 days and the test started failing. The last value is 
> retained, hence there is one value fetched from hbase. 
> Will submit a patch with the test case fixed shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container


[ 
https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562096#comment-14562096
 ] 

Hudson commented on YARN-2355:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7911 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7911/])
YARN-2355. MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a 
container (Darrell Taylor via aw) (aw: rev 
d6e3164d4a18271299c63377326ca56e8a980830)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java


> MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
> --
>
> Key: YARN-2355
> URL: https://issues.apache.org/jira/browse/YARN-2355
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Darrell Taylor
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: YARN-2355.001.patch
>
>
> After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether 
> it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be 
> able to notify the application of the up-to-date remaining retry quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs


[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562094#comment-14562094
 ] 

Hudson commented on YARN-3700:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7911 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7911/])
YARN-3700. Made generic history service load a number of latest applications 
according to the parameter or the configuration. Contributed by Xuan Gong. 
(zjshen: rev 54504133f41e36eaea6bb06c7b9ddb249468ecd7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/YarnWebParams.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java


> ATS Web Performance issue at load time when large number of jobs
> 
>
> Key: YARN-3700
> URL: https://issues.apache.org/jira/browse/YARN-3700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
> YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch
>
>
> Currently, we will load all the apps when we try to load the yarn 
> timelineservice web page. If we have large number of jobs, it will be very 
> slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562083#comment-14562083
 ] 

Robert Kanter commented on YARN-3528:
-

[~brahmareddy] are you still planning on working on this?

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle

2015-05-27 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562080#comment-14562080
 ] 

Li Lu commented on YARN-3721:
-

Yes, I agree. We do have the required module because we directly requested the 
hadoop-hdfs:tests. 

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI

2015-05-27 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562078#comment-14562078
 ] 

Naganarasimha G R commented on YARN-3581:
-

Thanks [~wangda] for reviewing and committing the patch.

> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562075#comment-14562075
 ] 

Hadoop QA commented on YARN-3721:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 36s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | yarn tests |   1m 14s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  37m 33s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735700/YARN-3721-YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit |
| git revision | YARN-2928 / e19566a |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8105/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8105/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8105/console |


This message was automatically generated.

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562066#comment-14562066
 ] 

Sangjin Lee commented on YARN-3721:
---

OK I think I have an idea. To start the mini hbase cluster, it depends on mini 
dfs cluster, and that is provided by hadoop-hdfs:tests. We have a direct 
dependency on hadoop-hdfs:tests in the timelineservice pom.xml. So even if we 
exclude the mini hadoop cluster dependency, we're covered because we're 
providing the needed dependencies more or less directly.

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3726) Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data


 [ 
https://issues.apache.org/jira/browse/YARN-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3726:
-
Attachment: YARN-3726-YARN-2928.001.patch


Attaching patch YARN-3726-YARN-2928.001.patch which generates the timestamps 
based on current time. 

> Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data
> --
>
> Key: YARN-3726
> URL: https://issues.apache.org/jira/browse/YARN-3726
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3726-YARN-2928.001.patch
>
>
> There is a very fascinating  bug that was introduced by the test data in the 
> metrics time series check in the unit test in TestHBaseTimelineWriterImpl in 
> YARN-3411. 
> The unit test failure seen is 
> {code}
> Error Message
> expected:<1> but was:<6>
> Stacktrace
> java.lang.AssertionError: expected:<1> but was:<6>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.checkMetricsTimeseries(TestHBaseTimelineWriterImpl.java:219)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.testWriteEntityToHBase(TestHBaseTimelineWriterImpl.java:204)
> {code}
> The test data had 6 timestamps that belonged to 22nd April 2015. When the 
> patch in YARN-3411 was submitted and tested by Hadoop QA on May 19th, the 
> unit test was working fine. Fast forward a few more days and the test started 
> failing. There has been no relevant code change or package version change 
> interim. The change that is triggering the unit test failure is the passage 
> of time.
> The reason for test failure is that the metrics time series data lives in a 
> column family which has a TTL set to 30 days. Metrics time series data was 
> written to the mini hbase cluster with cell timestamps set to April 22nd. 
> Based on the column family configuration, hbase started deleting the data 
> that was older than 30 days and the test started failing. The last value is 
> retained, hence there is one value fetched from hbase. 
> Will submit a patch with the test case fixed shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562048#comment-14562048
 ] 

Sangjin Lee commented on YARN-3721:
---

[~gtCarrera9], I'm a little confused.

You said

bq. The mini hbase cluster depends on mini hadoop cluster to launch hdfs servers

but the patch excludes the mini hadoop cluster from the hbase-testing-util 
module dependency. So do we or don't we need the mini hadoop cluster dependency?

FWIW, I can confirm that with your patch the build works fine and the unit 
tests are able to bring up the mini hbase cluster.

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3170) YARN architecture document needs updating

2015-05-27 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562037#comment-14562037
 ] 

Allen Wittenauer commented on YARN-3170:


{code}
and job scheduling/monitoring, 
{code}

Unnecessary comma.

{code}
The Scheduler has a pluggable policy plug-in,
{code}

Awkward phrasing.  Unnecessary comma.

{code}
In the first version, only memory is supported.
{code}

Remove.  It isn't true anymore and highlights one of the things I absolutely 
hate about Hadoop documentation:  roadmaps don't belong here. 

{code}
The current MapReduce schedulers such as the CapacityScheduler and the 
FairScheduler would be some examples of the plug-in.
{code}

"MapReduce" should be removed.  "of plug-ins"; no the & make it plural.  Also 
link to their respective documentation pages.

{code}
The CapacityScheduler supports hierarchical queues to allow for more 
predictable sharing of cluster resources
{code}

Drop this sentence.



> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: YARN-1197 old-design-docs-patches-for-reference.zip

Made old design docs / patches to a zip file to avoid unnecessary noise.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-server-common.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, 
> yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-server-nodemanager.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-1197-v5.pdf)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-api-protocol.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-server-resourcemanager.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-pb-impl.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, 
> yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-1197-v2.pdf)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: tools-project.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-1197-v3.pdf)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-1197-v4.pdf)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-1197-scheduler-v1.pdf)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: yarn-1197.pdf)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1197) Support changing resources of an allocated container


 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-
Attachment: (was: mapreduce-project.patch.ver.1)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, tools-project.patch.ver.1, 
> yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, 
> yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3170) YARN architecture document needs updating

2015-05-27 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3170:
---
Labels:   (was: BB2015-05-TBR)

> YARN architecture document needs updating
> -
>
> Key: YARN-3170
> URL: https://issues.apache.org/jira/browse/YARN-3170
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Allen Wittenauer
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3170-002.patch, YARN-3170-003.patch, 
> YARN-3170-004.patch, YARN-3170-005.patch, YARN-3170-006.patch, 
> YARN-3170-007.patch, YARN-3170-008.patch, YARN-3170.patch
>
>
> The marketing paragraph at the top, "NextGen MapReduce", etc are all 
> marketing rather than actual descriptions. It also needs some general 
> updates, esp given it reads as though 0.23 was just released yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle


[ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562007#comment-14562007
 ] 

Vrushali C commented on YARN-3721:
--

I looked into the unit test failure for TestHBaseTimelineWriterImpl. Discovered 
a rather fascinating bug in my test data and have filed YARN-3726 for tracking. 

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3726) Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data

Vrushali C created YARN-3726:

Summary: Fix TestHBaseTimelineWriterImpl unit test failure by
fixing it's test data
Key: YARN-3726
URL: https://issues.apache.org/jira/browse/YARN-3726
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vrushali C
Assignee: Vrushali C

There is a very fascinating bug that was introduced by the test data in the
metrics time series check in the unit test in TestHBaseTimelineWriterImpl in
YARN-3411.

The unit test failure seen is
{code}
Error Message
expected:<1> but was:<6>
Stacktrace
java.lang.AssertionError: expected:<1> but was:<6>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at
org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.checkMetricsTimeseries(TestHBaseTimelineWriterImpl.java:219)
at
org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl.testWriteEntityToHBase(TestHBaseTimelineWriterImpl.java:204)

{code}

The test data had 6 timestamps that belonged to 22nd April 2015. When the patch
in YARN-3411 was submitted and tested by Hadoop QA on May 19th, the unit test
was working fine. Fast forward a few more days and the test started failing.
There has been no relevant code change or package version change interim. The
change that is triggering the unit test failure is the passage of time.

The reason for test failure is that the metrics time series data lives in a
column family which has a TTL set to 30 days. Metrics time series data was
written to the mini hbase cluster with cell timestamps set to April 22nd. Based
on the column family configuration, hbase started deleting the data that was
older than 30 days and the test started failing. The last value is retained,
hence there is one value fetched from hbase.

Will submit a patch with the test case fixed shortly.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561970#comment-14561970
 ] 

Hudson commented on YARN-3581:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7910 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7910/])
YARN-3581. Deprecate -directlyAccessNodeLabelStore in RMAdminCLI. 
(Naganarasimha G R via wangda) (wangda: rev 
cab7674e54c4fe56838668462de99a6787841309)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestClusterCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ClusterCLI.java


> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2015-05-27 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:

Attachment: YARN-2069-2.6-11.patch

Uploading the 2.6 patch

Thanks,
Mayank

> CS queue level preemption should respect user-limits
> 
>
> Key: YARN-2069
> URL: https://issues.apache.org/jira/browse/YARN-2069
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Mayank Bansal
>  Labels: BB2015-05-TBR
> Attachments: YARN-2069-2.6-11.patch, YARN-2069-trunk-1.patch, 
> YARN-2069-trunk-10.patch, YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, 
> YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, 
> YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch
>
>
> This is different from (even if related to, and likely share code with) 
> YARN-2113.
> YARN-2113 focuses on making sure that even if queue has its guaranteed 
> capacity, it's individual users are treated in-line with their limits 
> irrespective of when they join in.
> This JIRA is about respecting user-limits while preempting containers to 
> balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3547) FairScheduler: Apps that have no resource demand should not participate scheduling

2015-05-27 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561924#comment-14561924
 ] 

Karthik Kambatla commented on YARN-3547:


Sorry for the delay in response here. I am okay with using 
SchedulerApplicationAttempt.getAppAttemptResourceUsage().getPending(). I don't 
anticipate any issues due to potential mismatch between this value and 
getDemand - getResourceUsage. We can always fix them if we run into them.

> FairScheduler: Apps that have no resource demand should not participate 
> scheduling
> --
>
> Key: YARN-3547
> URL: https://issues.apache.org/jira/browse/YARN-3547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-3547.001.patch, YARN-3547.002.patch, 
> YARN-3547.003.patch, YARN-3547.004.patch, YARN-3547.005.patch
>
>
> At present, all of the 'running' apps participate the scheduling process, 
> however, most of them may have no resource demand on a production cluster, as 
> the app's status is running other than waiting for resource at the most of 
> the app's lifetime. It's not a wise way we sort all the 'running' apps and 
> try to fulfill them, especially on a large-scale cluster which has heavy 
> scheduling load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3547) FairScheduler: Apps that have no resource demand should not participate scheduling


[ 
https://issues.apache.org/jira/browse/YARN-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561894#comment-14561894
 ] 

Wangda Tan commented on YARN-3547:
--

[~kasha], do you have any concern if using 
{{SchedulerApplicationAttempt.getAppAttemptResourceUsage().getPending()}}.

> FairScheduler: Apps that have no resource demand should not participate 
> scheduling
> --
>
> Key: YARN-3547
> URL: https://issues.apache.org/jira/browse/YARN-3547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-3547.001.patch, YARN-3547.002.patch, 
> YARN-3547.003.patch, YARN-3547.004.patch, YARN-3547.005.patch
>
>
> At present, all of the 'running' apps participate the scheduling process, 
> however, most of them may have no resource demand on a production cluster, as 
> the app's status is running other than waiting for resource at the most of 
> the app's lifetime. It's not a wise way we sort all the 'running' apps and 
> try to fulfill them, especially on a large-scale cluster which has heavy 
> scheduling load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs


[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561877#comment-14561877
 ] 

Hadoop QA commented on YARN-3700:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 58s | Site still builds. |
| {color:red}-1{color} | checkstyle |   1m 57s | The applied patch generated  1 
new checkstyle issues (total was 211, now 212). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m  9s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-common. |
| | |  53m 28s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735671/YARN-3700.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 4102e58 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8104/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8104/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8104/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8104/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8104/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8104/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8104/console |


This message was automatically generated.

> ATS Web Performance issue at load time when large number of jobs
> 
>
> Key: YARN-3700
> URL: https://issues.apache.org/jira/browse/YARN-3700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
> YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch
>
>
> Currently, we will load all the apps when we try to load the yarn 
> timelineservice web page. If we have large number of jobs, it will be very 
> slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561874#comment-14561874
 ] 

Wangda Tan commented on YARN-3581:
--

Think about this, this is not a critical issue for 2.7.1, so I think it's 
better not to commit to 2.7.1. Removed it from target version. Committing.

> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


 [ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3581:
-
Target Version/s: 2.8.0  (was: 2.8.0, 2.7.1)

> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle

2015-05-27 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3721:

Attachment: YARN-3721-YARN-2928.002.patch

We are indeed depending on hbase-test-util since we need to launch mini hbase 
cluster for both HBase and Phoenix timeline writers. I checked with the HBase 
guys and it (hbase-test-util) seems to be the right module to depend on. The 
mini hbase cluster depends on mini hadoop cluster to launch hdfs servers, but 
so far I don't aware any usages to timeline server. 

In this patch I'm cleaning up the dependency relationships of hbast-test-util. 
I've moved the exclusion information from timelineserver's pom to the 
centralized pom to exclude mini hadoop cluster. I've also done some formatting 
work for the pom files w.r.t changes in YARN-3529. 

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch, YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle

2015-05-27 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3721:

Attachment: YARN-3721-YARN-2928.002.patch

We are indeed depending on hbase-test-util since we need to launch mini hbase 
cluster for both HBase and Phoenix timeline writers. I checked with the HBase 
guys and it (hbase-test-util) seems to be the right module to depend on. The 
mini hbase cluster depends on mini hadoop cluster to launch hdfs servers, but 
so far I don't aware any usages to timeline server. 

In this patch I'm cleaning up the dependency relationships of hbast-test-util. 
I've moved the exclusion information from timelineserver's pom to the 
centralized pom to exclude mini hadoop cluster. I've also done some formatting 
work for the pom files w.r.t changes in YARN-3529. 

> build is broken on YARN-2928 branch due to possible dependency cycle
> 
>
> Key: YARN-3721
> URL: https://issues.apache.org/jira/browse/YARN-3721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3721-YARN-2928.001.patch, 
> YARN-3721-YARN-2928.002.patch
>
>
> The build is broken on the YARN-2928 branch at the 
> hadoop-yarn-server-timelineservice module. It's been broken for a while, but 
> we didn't notice it because the build happens to work despite this if the 
> maven local cache is not cleared.
> To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
> local cache and build it.
> Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3723) Need to clearly document primaryFilter and otherInfo value type


[ 
https://issues.apache.org/jira/browse/YARN-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561806#comment-14561806
 ] 

Hadoop QA commented on YARN-3723:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   3m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m 18s | Site still builds. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   7m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735677/YARN-3723.1.patch |
| Optional Tests | site |
| git revision | trunk / 4102e58 |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8103/console |


This message was automatically generated.

> Need to clearly document primaryFilter and otherInfo value type
> ---
>
> Key: YARN-3723
> URL: https://issues.apache.org/jira/browse/YARN-3723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-3723.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty


[ 
https://issues.apache.org/jira/browse/YARN-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561795#comment-14561795
 ] 

Zhijie Shen commented on YARN-3725:
---

I'm proposing to do the following:

1. Short term fix for 2.7.1: Check if service address in timeline DT is empty 
or not. If empty, we fall back to use the configured service address. It will 
make app submission via REST API work in secure mode without additional DT 
process work unless users really want to renew the DT from somewhere other than 
the configure address. It shouldn't be common as we usually only setup one 
timeline server per YARN cluster.

2. Long term fix: we can do something similar to HDFS-6904. Let the client to 
pass in the service address, and set token's service address at server side 
before serializing it into a string. And this problem is not just limited to 
ATS. RM REST API doesn't set the service address for RM DT too. It's better to 
seek for a common solution. For example, we can fix 
DelegationTokenAuthenticationHandler to make all use cases of hadoop http auth 
component set the service addr properly. One step further, even RPC protocol 
may have the similar problem. For example, if we work with 
ApplicationClientProtocol directly, we should get an RM DT without service 
address (correct me if I'm wrong).

Thoughts?

> App submission via REST API is broken in secure mode due to Timeline DT 
> service address is empty
> 
>
> Key: YARN-3725
> URL: https://issues.apache.org/jira/browse/YARN-3725
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Blocker
>
> YARN-2971 changes TimelineClient to use the service address from Timeline DT 
> to renew the DT instead of configured address. This break the procedure of 
> submitting an YARN app via REST API in the secure mode.
> The problem is that service address is set by the client instead of the 
> server in Java code. REST API response is an encode token Sting, such that 
> it's so inconvenient to deserialize it and set the service address and 
> serialize it again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object


[ 
https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561792#comment-14561792
 ] 

Hudson commented on YARN-3647:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7909 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7909/])
YARN-3647. RMWebServices api's should use updated api from 
CommonNodeLabelsManager to get NodeLabel object. (Sunil G via wangda) (wangda: 
rev ec0a852a37d5c91a62d3d0ff3ddbd9d58235b312)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


> RMWebServices api's should use updated api from CommonNodeLabelsManager to 
> get NodeLabel object
> ---
>
> Key: YARN-3647
> URL: https://issues.apache.org/jira/browse/YARN-3647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch
>
>
> After YARN-3579, RMWebServices apis can use the updated version of apis in 
> CommonNodeLabelsManager which gives full NodeLabel object instead of creating 
> NodeLabel object from plain label name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be


[ 
https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561790#comment-14561790
 ] 

Hudson commented on YARN-3626:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7909 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7909/])
YARN-3626. On Windows localized resources are not moved to the front of the 
classpath when they should be. Contributed by Craig Welch. (cnauroth: rev 
4102e5882e17b75507ae5cf8b8979485b3e24cbc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java


> On Windows localized resources are not moved to the front of the classpath 
> when they should be
> --
>
> Key: YARN-3626
> URL: https://issues.apache.org/jira/browse/YARN-3626
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: Windows
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.7.1
>
> Attachments: YARN-3626.0.patch, YARN-3626.11.patch, 
> YARN-3626.14.patch, YARN-3626.15.patch, YARN-3626.16.patch, 
> YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch
>
>
> In response to the mapreduce.job.user.classpath.first setting the classpath 
> is ordered differently so that localized resources will appear before system 
> classpath resources when tasks execute.  On Windows this does not work 
> because the localized resources are not linked into their final location when 
> the classpath jar is created.  To compensate for that localized jar resources 
> are added directly to the classpath generated for the jar rather than being 
> discovered from the localized directories.  Unfortunately, they are always 
> appended to the classpath, and so are never preferred over system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561770#comment-14561770
 ] 

Vrushali C commented on YARN-3051:
--

Hi Varun,

Good points.. My answers inline.
bq. We can either return a single timeline entity for a flow ID(having 
aggregated metric values) or multiple entities indicating multiple flows runs 
for a flow ID. I have included an API for the former as of now. I think there 
can be uses cases for both though. Vrushali C, did hRaven have the facility for 
both kinds of queries ? I mean, is there a known use case ?

Yes, there are use cases for both. hRaven has apis for both types of calls, 
they are named differently though. The /flow endpoint in hRaven will return 
multiple flow runs (limited by filters). The /summary will return aggregated 
values for all the runs of that flow in that time range filter. Let me give an 
example (a hadoop sleep job for simplicity).

Say user janedoe runs a hadoop sleep job 3 times today and has run it 5 times 
yesterday and say 6 times on one day about a month back. Now, we may want to 
see two different things:

#1 summarized stats for flow “Sleep job” invoked between last 2 days: It would 
say this flow was run 8 times, first was at timestamp X, last run was at 
timestamp Y, it took up a total of N megabytemillis, had a total of M 
containers across all runs, etc etc. It tells us how much of the cluster 
capacity a particular flow from a particular user is taking up.

-#2 List of flow runs: Will show us details about each flow run. If we say 
limit = 3 in the query parameters, it would return latest 3 runs of this flow. 
If we say limit = 100, it would return all the runs in this particular case 
(including the ones from a month back). If we pass in flowVersion=XXYYZZ, then 
it would return the list of flows that match this version. 

For the initial development, I think we may want to work on #2 first (return 
list of flow runs). The summary api will need aggregated tables which we can 
add later on, we could file a jira for that, my 2c.

bq. Do we plan to include additional info in the user table which can be used 
for filtering user level entites ? Could not think of any use case but just for 
flexibility I have added filters in the API getUserEntities.

I haven’t looked at the code in detail, but as such, for user level entities, 
we would want time range, limit on number of records returns, flow name filter, 
cluster name filter.

bq. I have included an API to query flow information based on the appid. As of 
now I return the flow to which app belongs to(includes multiple runs) instead 
of flow run it belongs to. Which is a more viable scenario ? Or we need to 
support both ?

An app id can belong to exactly one flow run. App id is the hadoop yarn 
application id, which should be unique on the cluster. Given an app id, we 
should be able to look up the exact flow run and return just that. The 
equivalent api in hRaven is /jobFlow.

bq.  But if metrics are aggregated daily and weekly, we wont be able to get 
something like value of specific metric for a flow from say Thursday 4 pm to 
Friday 9 am. Vrushali C, can you confirm ? If this is so, a timestamp doesnt 
make much sense. Dates can be specified instead.

The thinking is to split the querying across tables. We would query both the 
daily summary table for the complete day details and the regular flow tables 
for the details like those of Thursday 4 pm to Friday 9 am. But this does mean 
aggregating on the query side. So, I think, for starters, we could start off by 
allowing Date boundaries. We can enhance the API to accept finer timestamps 
later.

bq. Will there be queue table(s) in addition to user table(s) ? If yes, how 
will queue data be aggregated ? Based on entity type ? I may need an additional 
API for queues then.
Yes, we would need a queue based aggregation table. Right now, those details 
are to be worked out. So perhaps we can leave aside the queue based APIs (or 
file a different jira to handle queue based apis).

Hope this helps. I can give you more examples if you would like to get more 
details or have any other questions. I will also look at the patch this week.  
Also, we should ensure we use the same classes/methods used for key related 
(flow keys, row keys) construction and parsing across reader apis and writer 
apis else they will diverge.

thanks
Vrushali


> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-305

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561741#comment-14561741
 ] 

Varun Saxena commented on YARN-3411:


[~zjshen], I was actually talking about store insertion time and not the entity 
start time. 

If you look at {{LevelDbTimelineStore#checkStartTimeInDb}}, you would find that 
there is a store insert time(which is taken as current system time) also added 
in addition to entity start time. Pls note that store insert time and entity 
start time are not same. 
 
In ATSv1, we could specify a timestamp in query which is used to ignore 
entities that were inserted into the store after it. This is done by matching 
against the store insert time(which is not same as entity start time).

So for backward compatibility sake, do we need to support it ? If yes, I dont 
see it being captured as part of writer implementations, as of now.
If there is no use case for it though, we can drop it in ATSv2. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Fix For: YARN-2928
>
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561739#comment-14561739
 ] 

Varun Saxena commented on YARN-3411:


[~zjshen], I was actually talking about store insertion time and not the entity 
start time. 

If you look at {{LevelDbTimelineStore#checkStartTimeInDb}}, you would find that 
there is a store insert time(which is taken as current system time) also added 
in addition to entity start time. Pls note that store insert time and entity 
start time are not same. 
 
In ATSv1, we could specify a timestamp in query which is used to ignore 
entities that were inserted into the store after it. This is done by matching 
against the store insert time(which is not same as entity start time).

So for backward compatibility sake, do we need to support it ? If yes, I dont 
see it being captured as part of writer implementations, as of now.
If there is no use case for it though, we can drop it in ATSv2. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Fix For: YARN-2928
>
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561722#comment-14561722
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy]
Makes sense to me. Will update the doc to include this.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
> yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561710#comment-14561710
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy]
Makes sense to me. Will update the doc to include this.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
> yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty

Zhijie Shen created YARN-3725:
-

 Summary: App submission via REST API is broken in secure mode due 
to Timeline DT service address is empty
 Key: YARN-3725
 URL: https://issues.apache.org/jira/browse/YARN-3725
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


YARN-2971 changes TimelineClient to use the service address from Timeline DT to 
renew the DT instead of configured address. This break the procedure of 
submitting an YARN app via REST API in the secure mode.

The problem is that service address is set by the client instead of the server 
in Java code. REST API response is an encode token Sting, such that it's so 
inconvenient to deserialize it and set the service address and serialize it 
again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561682#comment-14561682
 ] 

Zhijie Shen commented on YARN-3411:
---

Yeah, in v1, there's a starttime for entity, which is used to indicate when the 
entity starts to exist. This value is used in multiple places. For example, 
when we query entities, the matched entities are sorted according to timestamp 
to be returned. Also, in v1 the retention granularity is at entity level. We 
check if the starttime of an entity is out of TTL, and then decide to discard 
it and its events.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Fix For: YARN-2928
>
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561659#comment-14561659
 ] 

Varun Saxena commented on YARN-3411:


In ATSv1, we consider the timestamp when entity is added to backend store in 
addition to entity creation time. This is used while filtering out entities 
during querying. I cannot see this being captured specifically in this patch. 
It can be easily added to Column Family info.

[~zjshen], [~sjlee0], do we need to add this info ? Zhijie, for this, any 
specific use case you know of in ATSv1 ?

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Fix For: YARN-2928
>
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs


[ 
https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561651#comment-14561651
 ] 

Zhijie Shen commented on YARN-3700:
---

+1 last patch LGTM will commit it.

> ATS Web Performance issue at load time when large number of jobs
> 
>
> Key: YARN-3700
> URL: https://issues.apache.org/jira/browse/YARN-3700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, 
> YARN-3700.2.2.patch, YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch
>
>
> Currently, we will load all the apps when we try to load the yarn 
> timelineservice web page. If we have large number of jobs, it will be very 
> slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561644#comment-14561644
 ] 

Hadoop QA commented on YARN-3581:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 29s | The applied patch generated  6 
new checkstyle issues (total was 40, now 42). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 52s | Tests passed in 
hadoop-yarn-client. |
| | |  42m 29s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735661/YARN-3581.20150528-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c46d4ba |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8102/artifact/patchprocess/diffcheckstylehadoop-yarn-client.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8102/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8102/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8102/console |


This message was automatically generated.

> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers


[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561641#comment-14561641
 ] 

Varun Saxena commented on YARN-3051:


In the API designed in the patch, there are few things I wanted to discuss.

#  We can either return a single timeline entity for a flow ID(having 
aggregated metric values)  or multiple entities indicating multiple flows runs 
for a flow ID. I have included an API for the former as of now. I think there 
can be uses cases for both though. [~vrushalic],  did hRaven have the facility 
for both kinds of queries ? I mean, is there a known use case ?
# Do we plan to include additional info in the user table which can be used for 
filtering user level entites ? Could not think of any use case but just for 
flexibility I have added filters in the API {{getUserEntities}}.
# I have included an API to query flow information based on the appid. As of 
now I return the flow to which app belongs to(includes multiple runs) instead 
of flow run it belongs to. Which is a more viable scenario ? Or we need to 
support both ?
# In the HBase schema design, there are 2 flow summary tables aggregated daily 
and weekly respectively. So to limit the number of metric records or to see 
metrics in a specific time window, I have added metric start and metric end 
timestamps in the API design. But if  metrics are aggregated daily and weekly, 
we wont be able to get something like value of specific metric for a flow from 
say Thursday 4 pm to Friday 9 am. [~vrushalic], can you confirm ? If this is 
so, a timestamp doesnt make much sense. Dates can be specified instead.
# Will there be queue table(s) in addition to user table(s) ? If yes, how will 
queue data be aggregated ? Based on entity type ? I may need an additional API 
for queues then.
# The doubt I have regarding flow version will anyways be addressed by YARN-3699

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3603) Application Attempts page confusing

2015-05-27 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3603:
--
Attachment: 0001-YARN-3603.patch

Uploading an initial version patch.

* "Container ID" is shown only for Running containers in App Attempt page. 
Change the column name to "Running Container ID"
* "AM Container" is showing the container link when Attempt is running, else 
showing the container ID in plain text. Here we can change label to "AM 
Container Link" in case when AM is running and "AM Container ID" while AM is 
finished or killed
* AM Container logs are shown in App page but not app attempt page. An entry is 
added for same as "AM Container Logs"

> Application Attempts page confusing
> ---
>
> Key: YARN-3603
> URL: https://issues.apache.org/jira/browse/YARN-3603
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Thomas Graves
>Assignee: Sunil G
> Attachments: 0001-YARN-3603.patch
>
>
> The application attempts page 
> (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01)
> is a bit confusing on what is going on.  I think the table of containers 
> there is for only Running containers and when the app is completed or killed 
> its empty.  The table should have a label on it stating so.  
> Also the "AM Container" field is a link when running but not when its killed. 
>  That might be confusing.
> There is no link to the logs in this page but there is in the app attempt 
> table when looking at http://
> rm:8088/cluster/app/application_1431101480046_0003



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3723) Need to clearly document primaryFilter and otherInfo value type


 [ 
https://issues.apache.org/jira/browse/YARN-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3723:
--
Attachment: YARN-3723.1.patch

Add some description about the value type as well as fix a minor format issue 
in the document.

> Need to clearly document primaryFilter and otherInfo value type
> ---
>
> Key: YARN-3723
> URL: https://issues.apache.org/jira/browse/YARN-3723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-3723.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-05-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561613#comment-14561613
 ] 

Jason Lowe commented on YARN-3585:
--

Yes, the idea is to show whether we successfully closed the database or not 
when the problem occurs.  Sorry I wasn't clear on that.

> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Priority: Critical
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


 [ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3581:
-
Target Version/s: 2.8.0, 2.7.1  (was: 2.8.0)

> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3581) Deprecate -directlyAccessNodeLabelStore in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561590#comment-14561590
 ] 

Wangda Tan commented on YARN-3581:
--

[~Naganarasimha], 
Thanks for update, the latest patch looks good. And I think it's better to add 
to 2.7.1 as well to avoid people use it. We will not remove these options in 
2.8, but we should let people know about the risk.

Wangda

> Deprecate -directlyAccessNodeLabelStore in RMAdminCLI
> -
>
> Key: YARN-3581
> URL: https://issues.apache.org/jira/browse/YARN-3581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3581.20150525-1.patch, YARN-3581.20150528-1.patch
>
>
> In 2.6.0, we added an option called "-directlyAccessNodeLabelStore" to make 
> RM can start with label-configured queue settings. After YARN-2918, we don't 
> need this option any more, admin can configure queue setting, start RM and 
> configure node label via RMAdminCLI without any error.
> In addition, this option is very restrictive, first it needs to run on the 
> same node where RM is running if admin configured to store labels in local 
> disk.
> Second, when admin run the option when RM is running, multiple process write 
> to a same file can happen, this could make node label store becomes invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3569) YarnClient.getAllQueues returns a list of queues that do not display running apps.


[ 
https://issues.apache.org/jira/browse/YARN-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561580#comment-14561580
 ] 

Hadoop QA commented on YARN-3569:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 34s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 50s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| | |  46m 51s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | org.apache.hadoop.yarn.client.TestResourceTrackerOnHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731637/YARN-3569.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c46d4ba |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8101/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8101/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8101/console |


This message was automatically generated.

> YarnClient.getAllQueues returns a list of queues that do not display running 
> apps.
> --
>
> Key: YARN-3569
> URL: https://issues.apache.org/jira/browse/YARN-3569
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Spandan Dutta
>Assignee: Spandan Dutta
> Attachments: YARN-3569.patch
>
>
> YarnClient.getAllQueues() returns a list of queues. If we pick a queue from 
> this list and call getApplications on it, we always get an empty list 
> even-though applications are running on that queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container