[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: YARN-2893.005.patch

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, 
 YARN-2893.005.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-30 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3561:
-
Fix Version/s: (was: 2.6.1)

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical

 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/api-util-1.0.0-M20.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/confdir/log4j-server.properties 
 transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,358 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 

[jira] [Commented] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-04-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521200#comment-14521200
 ] 

Steve Loughran commented on YARN-3561:
--

Vinod probably means the AM restart flag.

When the Slider AM is stopped it can be done in two ways
{code}
slider stop $clustername
{code}

Sends an RPC call to the AM, which then unregisters and shuts down. I'll check 
to make sure we explicitly release containers.
{code}
slider stop $clustername --force
{code}

This asks YARN to kill the app; the AM doesn't get told about it.

there's a third way

{code}
slider am-suicide $clustername
{code}

This is only for testing, causes the AM to call {{System.exit(-1)}}; YARN will 
restart it unless it has failed too many times already.

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical

 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/jettison-1.1.jar
  transitioned from INIT to 

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520961#comment-14520961
 ] 

zhihai xu commented on YARN-2893:
-

thanks [~jira.shegalov], that is a good catch. I uploaded a new patch 
YARN-2893.005.patch, which fixed the double indentation checkstyle violation.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, 
 YARN-2893.005.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-04-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3552:
-
Attachment: 0001-YARN-3552.patch

 RM Web UI shows -1 running containers for completed apps
 

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 
 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521027#comment-14521027
 ] 

Hadoop QA commented on YARN-3552:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 38s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  62m 58s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 106m 14s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729401/0001-YARN-3552.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / aa22450 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7550/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7550/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7550/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7550/console |


This message was automatically generated.

 RM Web UI shows -1 running containers for completed apps
 

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 
 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520957#comment-14520957
 ] 

Hadoop QA commented on YARN-3271:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   7m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   8m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   4m 43s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  51m 45s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  76m 33s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729398/YARN-3271.2.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / aa22450 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7549/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7549/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7549/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7549/console |


This message was automatically generated.

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch, YARN-3271.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-04-30 Thread Ryu Kobayashi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520954#comment-14520954
 ] 

Ryu Kobayashi commented on YARN-3552:
-

I think this problem exists even FairScheduler.
FairSchedulerAppsBlock.java:
{code}
  .append(appInfo.getRunningContainers()).append(\,\)
{code}

 RM Web UI shows -1 running containers for completed apps
 

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler

2015-04-30 Thread Dian Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated YARN-3557:
--
Attachment: Support TXT in YARN high level design doc.pdf

A high level design doc attached.

 Support Intel Trusted Execution Technology(TXT) in YARN scheduler
 -

 Key: YARN-3557
 URL: https://issues.apache.org/jira/browse/YARN-3557
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Dian Fu
 Attachments: Support TXT in YARN high level design doc.pdf


 Intel TXT defines platform-level enhancements that provide the building 
 blocks for creating trusted platforms. A TXT aware YARN scheduler can 
 schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
 provides the capacity to restrict YARN applications to run only on cluster 
 nodes that have a specified node label. This is a good mechanism that be 
 utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521161#comment-14521161
 ] 

Hadoop QA commented on YARN-3552:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  1s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 56s | The applied patch generated  2 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 15s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 27s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 35s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729420/0001-YARN-3552.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f5b3847 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7552/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7552/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7552/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7552/console |


This message was automatically generated.

 RM Web UI shows -1 running containers for completed apps
 

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 
 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-04-30 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521001#comment-14521001
 ] 

Rohith commented on YARN-3552:
--

My bad, missed it. Thanks for pointing out, will update patch for it

 RM Web UI shows -1 running containers for completed apps
 

 Key: YARN-3552
 URL: https://issues.apache.org/jira/browse/YARN-3552
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG


 In the RMServerUtils, the default values are negative number which results in 
 the displayiing the RM web UI also negative number. 
 {code}
   public static final ApplicationResourceUsageReport
 DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
   Resources.createResource(-1, -1), 0, 0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521095#comment-14521095
 ] 

Hadoop QA commented on YARN-2893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 32s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 35s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 57s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729414/YARN-2893.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / aa22450 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7551/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7551/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7551/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7551/console |


This message was automatically generated.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, 
 YARN-2893.005.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521217#comment-14521217
 ] 

Steve Loughran commented on YARN-3539:
--

works for me

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3476:
-
Attachment: 0001-YARN-3476.patch

Apologies for looking back to issue delayed.
bq. Would it be better to wrap the cleanup in a finally block or something a 
little more broadly applicable to errors that occur?
Make sense to me. 

Uploading the patch with handling exception and do the post clean up.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Report node resource utilization

2015-04-30 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521261#comment-14521261
 ] 

Varun Vasudev commented on YARN-3534:
-

Thanks for the path [~elgoiri]! Is it possible for you to split up the patch 
into two - one to record the memory and cpu utilization in 
NodeResourceMonitorImpl and one for the RPC changes. If you could file a 
separate JIRA for the NodeResourceMonitorImpl changes that you have made(they 
are definitely required and should go in), that would be ideal.

I'm not sure if we should use RPC  to expose the stats. Using RPC restricts the 
stats to a limited number of services whereas using REST lets a larger number 
of services access the stats - which is why YARN-3332 wants to use REST to 
expose the NM stats. Using REST should also make it easier to add more stats 
with respect to utilization.



 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521406#comment-14521406
 ] 

Hudson commented on YARN-3517:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #170 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/170/])
YARN-3517. RM web ui for dumping scheduler logs should be for admins only 
(Varun Vasudev via tgraves) (tgraves: rev 
2e215484bd05cd5e3b7a81d3558c6879a05dd2d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java


 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521403#comment-14521403
 ] 

Hudson commented on YARN-3533:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #170 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/170/])
YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. 
Contributed by Anubhav Dhoot (jianhe: rev 
4c1af156aef4f3bb1d9823d5980c59b12007dc77)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521449#comment-14521449
 ] 

Hudson commented on YARN-3533:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #913 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/913/])
YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. 
Contributed by Anubhav Dhoot (jianhe: rev 
4c1af156aef4f3bb1d9823d5980c59b12007dc77)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521465#comment-14521465
 ] 

Hadoop QA commented on YARN-3271:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   5m 32s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 15s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 56s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  74m 59s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729453/YARN-3271.3.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / de9404f |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7553/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7553/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7553/console |


This message was automatically generated.

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521471#comment-14521471
 ] 

Hadoop QA commented on YARN-3476:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 40s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  1s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   5m 58s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  48m 51s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729459/0001-YARN-3476.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / de9404f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7554/console |


This message was automatically generated.

 Nodemanager can fail to delete local logs if log aggregation fails
 --

 Key: YARN-3476
 URL: https://issues.apache.org/jira/browse/YARN-3476
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch


 If log aggregation encounters an error trying to upload the file then the 
 underlying TFile can throw an illegalstateexception which will bubble up 
 through the top of the thread and prevent the application logs from being 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521393#comment-14521393
 ] 

Hudson commented on YARN-3533:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #179 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/179/])
YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. 
Contributed by Anubhav Dhoot (jianhe: rev 
4c1af156aef4f3bb1d9823d5980c59b12007dc77)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521396#comment-14521396
 ] 

Hudson commented on YARN-3517:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #179 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/179/])
YARN-3517. RM web ui for dumping scheduler logs should be for admins only 
(Varun Vasudev via tgraves) (tgraves: rev 
2e215484bd05cd5e3b7a81d3558c6879a05dd2d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java


 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521475#comment-14521475
 ] 

Junping Du commented on YARN-1402:
--

Forget to mention, after this patch, keepAliveApplications in Heartbeat request 
is not required anymore. We should start work to remove it in another JIRA.

 Related Web UI, CLI changes on exposing client API to check log aggregation 
 status
 --

 Key: YARN-1402
 URL: https://issues.apache.org/jira/browse/YARN-1402
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
 YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521383#comment-14521383
 ] 

Hudson commented on YARN-3533:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2111 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2111/])
YARN-3533. Test: Fix launchAM in MockRM to wait for attempt to be scheduled. 
Contributed by Anubhav Dhoot (jianhe: rev 
4c1af156aef4f3bb1d9823d5980c59b12007dc77)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


 Test: Fix launchAM in MockRM to wait for attempt to be scheduled
 

 Key: YARN-3533
 URL: https://issues.apache.org/jira/browse/YARN-3533
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3533.001.patch


 MockRM#launchAM fails in many test runs because it does not wait for the app 
 attempt to be scheduled before NM update is sent as noted in [recent 
 builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521386#comment-14521386
 ] 

Hudson commented on YARN-3517:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2111 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2111/])
YARN-3517. RM web ui for dumping scheduler logs should be for admins only 
(Varun Vasudev via tgraves) (tgraves: rev 
2e215484bd05cd5e3b7a81d3558c6879a05dd2d2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt


 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521452#comment-14521452
 ] 

Hudson commented on YARN-3517:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #913 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/913/])
YARN-3517. RM web ui for dumping scheduler logs should be for admins only 
(Varun Vasudev via tgraves) (tgraves: rev 
2e215484bd05cd5e3b7a81d3558c6879a05dd2d2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


 RM web ui for dumping scheduler logs should be for admins only
 --

 Key: YARN-3517
 URL: https://issues.apache.org/jira/browse/YARN-3517
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, security
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
  Labels: security
 Fix For: 2.8.0

 Attachments: YARN-3517.001.patch, YARN-3517.002.patch, 
 YARN-3517.003.patch, YARN-3517.004.patch, YARN-3517.005.patch, 
 YARN-3517.006.patch


 YARN-3294 allows users to dump scheduler logs from the web UI. This should be 
 for admins only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-04-30 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3271:

Attachment: YARN-3271.3.patch

Updated patch for whitespace fix and test fix

 FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler 
 to TestAppRunnability
 ---

 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Assignee: nijel
 Attachments: YARN-3271.1.patch, YARN-3271.2.patch, YARN-3271.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3559) Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public

2015-04-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521226#comment-14521226
 ] 

Steve Loughran commented on YARN-3559:
--

I've not touched it -it is one line. What it does need though is agreement that 
the class really is public. 

(moving to HADOOP-)

 Mark org.apache.hadoop.security.token.Token as @InterfaceAudience.Public
 

 Key: YARN-3559
 URL: https://issues.apache.org/jira/browse/YARN-3559
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Steve Loughran

 {{org.apache.hadoop.security.token.Token}} is tagged 
 {{@InterfaceAudience.LimitedPrivate}} for HDFS and MapReduce.
 However, it is used throughout YARN apps, where both the clients and the AM 
 need to work with tokens. This class and related all need to be declared 
 public. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521578#comment-14521578
 ] 

Junping Du commented on YARN-3445:
--

Thanks for comments, [~vinodkv]!

bq. logAggregationReportsForApps itself is a map of ApplicationID with a nested 
LogAggregationReport.ApplicationID - duplicate AppID information
Are u suggest we should replace map with list in NodeHeartbeatRequest? I fully 
agree and I will suggest to do so in YARN-3505.

bq. runningApplications in this patch
In v2 patch, runningApplications is already removed. Kindly check the v2 patch 
again?

bq. NodeStatus.keepAliveApplications
I agree. This shouldn't be needed anymore after YARN-1402. I had the similar 
idea before in synced with Xuan but forget to put it on JIRA. May be we should 
file a separated JIRA to fix it?

CC [~xgong].

 Cache runningApps in RMNode for getting running apps on given NodeId
 

 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3445-v2.patch, YARN-3445.patch


 Per discussion in YARN-3334, we need filter out unnecessary collectors info 
 from RM in heartbeat response. Our propose is to add cache for runningApps in 
 RMNode, so RM only send collectors for local running apps back. This is also 
 needed in YARN-914 (graceful decommission) that if no running apps in NM 
 which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521580#comment-14521580
 ] 

Thomas Graves commented on YARN-3243:
-

[~leftnoteasy] Can we pull this back into the branch-2.7?  

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Report node resource utilization

2015-04-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521924#comment-14521924
 ] 

Karthik Kambatla commented on YARN-3534:


bq. I'm not sure if we should use RPC to expose the stats. 
As YARN-3332 suggests, we should expose these stats through REST. However, for 
the RM (scheduler) to get this information, I think RPC (through node 
heartbeat) is better. That way, the RM doesn't have to poll the REST API for 
every node there is an NM heartbeat for. 

If the suggestion is for the NM to get this information from the REST API, I 
think that is fine. However, since this communication is private to NM, we can 
always change this later once YARN-3332 is ready. 




 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522021#comment-14522021
 ] 

Thomas Graves commented on YARN-3243:
-

I was wanting to pull YARN-3434 back into 2.7.  It kind of depends on this one. 
Atleast I think it would merge cleanly if this one was there. 
This is also fixing a bug which I would like to see fixed in the 2.7 line if we 
are going to use it.  Its not a blocker since it exists in our 2.6 but it would 
be nice to have.  If we decide its to big then I'll just port YARN-3434 back 
without it   

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3473) Fix RM Web UI configuration for some properties

2015-04-30 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521845#comment-14521845
 ] 

Ray Chiang commented on YARN-3473:
--

RE: No new unit tests

Passed visual inspection on the RM UI web interface.

 Fix RM Web UI configuration for some properties
 ---

 Key: YARN-3473
 URL: https://issues.apache.org/jira/browse/YARN-3473
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: YARN-3473.001.patch


 Using the RM Web UI, the Tools-Configuration page shows some properties as 
 something like BufferedInputStream instead of the appropriate .xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522014#comment-14522014
 ] 

Wangda Tan commented on YARN-3243:
--

Thanks comment from [~vinodkv].

I would +1 for back port this into branch-2.7, even if this patch is 
potentially required to support non-exclusive node label, but this patch itself 
is a bug fix instead of new feature.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3564) Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521973#comment-14521973
 ] 

Hudson commented on YARN-3564:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7706 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7706/])
YARN-3564. Fix 
TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
randomly. (Jian He via wangda) (wangda: rev 
e2e8f771183df798e926abc97116316a05b19c9a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java


 Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.8.0

 Attachments: YARN-3564.1.patch


 the test fails intermittently in jenkins 
 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2497) Changes for fair scheduler to support allocate resource respect labels

2015-04-30 Thread Viplav Madasu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522001#comment-14522001
 ] 

Viplav Madasu commented on YARN-2497:
-

I have same comment as [~john.jian.fang]. [~yufeldman], any updates? Are you 
actively working on it? We would like to see the Fair Scheduler work with admin 
lables. Thanks.


 Changes for fair scheduler to support allocate resource respect labels
 --

 Key: YARN-2497
 URL: https://issues.apache.org/jira/browse/YARN-2497
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Yuliya Feldman





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler

2015-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521999#comment-14521999
 ] 

Wangda Tan commented on YARN-3557:
--

Hi [~dian.fu],
Thanks for posting design doc, I just done a quick look at the doc, it seems to 
me supporting TXT can stay outside of YARN scheduler. Scheduler doesn't need to 
know if a node is trusted or not, trusted will be a generic label of a node. 
And some questions for design:

bq. Currently for centralized node label configuration, it only supports admin 
configure node label through CLI. Need to provide a mechanism at RM side which 
can configure node label in the similar way as YARN-2495.
Now RM supports using CLI or REST API, are they enough for you to configure 
NM's trusted status?

bq. Currently user can configure centralized node label configuration or 
distributed node label configuration, but cannot configure both. 
Configure both could be problematic, see my comment: 
https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14317048page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14317048.

Please let me know if your thoughts.

 Support Intel Trusted Execution Technology(TXT) in YARN scheduler
 -

 Key: YARN-3557
 URL: https://issues.apache.org/jira/browse/YARN-3557
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Dian Fu
 Attachments: Support TXT in YARN high level design doc.pdf


 Intel TXT defines platform-level enhancements that provide the building 
 blocks for creating trusted platforms. A TXT aware YARN scheduler can 
 schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
 provides the capacity to restrict YARN applications to run only on cluster 
 nodes that have a specified node label. This is a good mechanism that be 
 utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521802#comment-14521802
 ] 

Hadoop QA commented on YARN-3544:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m  0s | Pre-patch branch-2.7 
compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729519/YARN-3544-branch-2.7-1.2.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2.7 / 185a1ff |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7555/console |


This message was automatically generated.

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544-branch-2.7-1.2.patch, YARN-3544-branch-2.7-1.patch, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-30 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521881#comment-14521881
 ] 

Sidharta Seethana commented on YARN-3366:
-

hi [~hex108] ,

The goal here is to ensure that we have a maximum assigned bandwidth for all 
YARN containers when cluster admins require this. The behavior you want (where 
YARN containers are allowed to use the total bandwidth) is possible by simply 
not configuring max YARN outbound bandwidth - in which case it defaults to the 
total bandwidth. 

thanks

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-30 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3539:
-
Attachment: YARN-3539-005.patch

patch -005


h3. Valid entity types.

Currently / and   are allowed in entity types. I'm not sure that is a good 
idea. I'd also worry about  and  for security reasons. A regexp of 
A-Za-z0-9 + others of a limited set may be enough. We can always check with all 
known users to see what entity types they are using. Locking it down for V1 
allows v2 to be consistent.

bq. 1. For the bullet points of Current Status and Future Plans, can we 
organize them a bit better. For example, we partition them into the groups of 
a) current status and b) future plans. For bullet 4, not just history, but all 
timeline data.

done

bq. 2. Can we move Timeline Server REST API section before Generic Data REST 
APIs?

done

bq. 3. Application elements table seems to be wrongly formatted. I think that's 
why site compilation is failed.

fixed 

bq. 4. Generic Data REST APIs output examples need to be slightly updated. 
Some more fields are added or changed.

Those are the examples from YARN-1876. If you have some more up to date ones 
I'll replace them.

done


bq. 5. Timeline Server REST API output examples are not genuine. Perhaps, we 
can run a simple MR example job, and get the up-to-date timeline entity and 
application info to show as the examples.

+1. Do you have this?

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521888#comment-14521888
 ] 

Wangda Tan commented on YARN-3565:
--

[~Naganarasimha], sure, please go ahead.

 NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
 instead of String
 -

 Key: YARN-3565
 URL: https://issues.apache.org/jira/browse/YARN-3565
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
Priority: Blocker

 Now NM HB/Register uses SetString, it will be hard to add new fields if we 
 want to support specifying NodeLabel type such as exclusivity/constraints, 
 etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521893#comment-14521893
 ] 

Jian He commented on YARN-3445:
---

One other thing, the LogAggregationReport#(get/set)getNodeId can also be 
removed as it's not used anywhere. 
I'm also unsure about the usage of 
LogAggregationReport#(get/set)DiagnosticMessage as it's only set with an empty 
string.

agree we can have a separate jira to fix this, preferably  in the same 2.8 
release.



 Cache runningApps in RMNode for getting running apps on given NodeId
 

 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3445-v2.patch, YARN-3445.patch


 Per discussion in YARN-3334, we need filter out unnecessary collectors info 
 from RM in heartbeat response. Our propose is to add cache for runningApps in 
 RMNode, so RM only send collectors for local running apps back. This is also 
 needed in YARN-914 (graceful decommission) that if no running apps in NM 
 which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521896#comment-14521896
 ] 

Hadoop QA commented on YARN-3539:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   3m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 4  line(s) that 
end in whitespace. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | site |   1m 56s | Site compilation is broken. |
| | |   5m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729541/YARN-3539-005.patch |
| Optional Tests | site |
| git revision | trunk / de9404f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7556/artifact/patchprocess/whitespace.txt
 |
| site | 
https://builds.apache.org/job/PreCommit-YARN-Build/7556/artifact/patchprocess/patchSiteWarnings.txt
 |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7556/console |


This message was automatically generated.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-04-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521926#comment-14521926
 ] 

Karthik Kambatla commented on YARN-3332:


[~vinodkv] - did you start implementing this? I would like to be involved in 
the work here - either implementing parts of it or reviewing most of it. 

 [Umbrella] Unified Resource Statistics Collection per node
 --

 Key: YARN-3332
 URL: https://issues.apache.org/jira/browse/YARN-3332
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: Design - UnifiedResourceStatisticsCollection.pdf


 Today in YARN, NodeManager collects statistics like per container resource 
 usage and overall physical resources available on the machine. Currently this 
 is used internally in YARN by the NodeManager for only a limited usage: 
 automatically determining the capacity of resources on node and enforcing 
 memory usage to what is reserved per container.
 This proposal is to extend the existing architecture and collect statistics 
 for usage b​eyond​ the existing use­cases.
 Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3521:
--
Attachment: 0002-YARN-3521.patch

Thank you [~leftnoteasy] for sharing the comments.
Pls find an updated patch addressing comments.

Pls check the same and let me know your thoughts.

 Support return structured NodeLabel objects in REST API when call 
 getClusterNodeLabels
 --

 Key: YARN-3521
 URL: https://issues.apache.org/jira/browse/YARN-3521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch


 In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
 make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521936#comment-14521936
 ] 

Karthik Kambatla commented on YARN-3481:


I have been working with [~rgrandl] on YARN-2965 (he shared his Tetris code 
privately). YARN-2965 aims to expose more than just CPU and memory - disk 
in/out bandwidth and network in/out bandwidth. I think it is okay to capture 
CPU and memory here, and add the remaining items in the context of that JIRA.



 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-3481
 URL: https://issues.apache.org/jira/browse/YARN-3481
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
   Original Estimate: 336h
  Remaining Estimate: 336h

 To allow the RM take better scheduling decisions, it should be aware of the 
 actual utilization of the containers. The NM would aggregate the 
 ContainerMetrics and report it in every heartbeat.
 Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3564) Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails randomly

2015-04-30 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3564:
-
Summary: Fix 
TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
randomly   (was: 
TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
randomly )

 Fix TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable fails 
 randomly 
 ---

 Key: YARN-3564
 URL: https://issues.apache.org/jira/browse/YARN-3564
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3564.1.patch


 the test fails intermittently in jenkins 
 https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522008#comment-14522008
 ] 

Vinod Kumar Vavilapalli commented on YARN-3243:
---

bq. Wangda Tan Can we pull this back into the branch-2.7?
I am not against it, but we need to rationalize why it needs to be pulled in. 
Specifically given the fact that this is a big patch. Also, it wouldn't stop 
here and you'll need more patches that depend on this?

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522058#comment-14522058
 ] 

Vinod Kumar Vavilapalli commented on YARN-3445:
---

bq. I agree. This shouldn't be needed anymore after YARN-1402. I had the 
similar idea before in synced with Xuan but forget to put it on JIRA. May be we 
should file a separated JIRA to fix it?
keepAliveApplications cannot be removed as we need to support protocol 
compatibility. But the new ones you added for logs can be removed as they are 
new. Can you take this forward also on YARN-3505?

bq. .. LogAggregationReport#(get/set)getNodeId .. 
LogAggregationReport#(get/set)DiagnosticMessage ..
[~jianhe], can you file a ticket please?


 Cache runningApps in RMNode for getting running apps on given NodeId
 

 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3445-v2.patch, YARN-3445.patch


 Per discussion in YARN-3334, we need filter out unnecessary collectors info 
 from RM in heartbeat response. Our propose is to add cache for runningApps in 
 RMNode, so RM only send collectors for local running apps back. This is also 
 needed in YARN-914 (graceful decommission) that if no running apps in NM 
 which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-04-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1451#comment-1451
 ] 

Jian He commented on YARN-3505:
---

One other thing, the LogAggregationReport#(get/set)getNodeId can also be 
removed as it's not used anywhere. 
I'm also unsure about the usage of 
LogAggregationReport#(get/set)DiagnosticMessage as it's only set with an empty 
string.

 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-3505.1.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3534) Collect node resource utilization

2015-04-30 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3534:
--
Summary: Collect node resource utilization  (was: Report node resource 
utilization)

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522044#comment-14522044
 ] 

Wangda Tan commented on YARN-3243:
--

[~tgraves], I think YARN-3434 needs YARN-3361, it cannot cleanly merge only 
with YARN-3243. Could you check it?

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Report node resource utilization

2015-04-30 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522068#comment-14522068
 ] 

Inigo Goiri commented on YARN-3534:
---

It looks like there's a consensus on splitting on monitoring and reporting. If 
everybody is cool with it, I will rename this JIRA to Collect node resource 
utilization and just do the implementaiton of NodeResourceMonitor.

The argument of how to pass this information to the RM seems still open. Should 
I open a new JIRA for that and move the discussion about using REST, heartbeat 
etc to that one? Or should we have that discussion in one of the related JIRAs 
(e.g., YARN-3332)?

 Report node resource utilization
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522150#comment-14522150
 ] 

Zhijie Shen commented on YARN-3544:
---

Verified 2.7 branch patch. It works fine too. Will commit it.

 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544-branch-2.7-1.2.patch, YARN-3544-branch-2.7-1.patch, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app

2015-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522170#comment-14522170
 ] 

Hudson commented on YARN-3544:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7707 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7707/])
YARN-3544. Got back AM logs link on the RM web UI for a completed app. 
Contributed by Xuan Gong. (zjshen: rev 7e8639fda40c13fe163128d7a725fcd0f2fce3c5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java


 AM logs link missing in the RM UI for a completed app 
 --

 Key: YARN-3544
 URL: https://issues.apache.org/jira/browse/YARN-3544
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.1

 Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, 
 YARN-3544-branch-2.7-1.2.patch, YARN-3544-branch-2.7-1.patch, 
 YARN-3544.1.patch


 AM log links should always be present ( for both running and completed apps).
 Likewise node info is also empty. This is usually quite crucial when trying 
 to debug where an AM was launched and a pointer to which NM's logs to look at 
 if the AM failed to launch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3566) YARN Scheduler Web UI not properly sorting through Application ID or Progress bar

2015-04-30 Thread Anthony Rojas (JIRA)
Anthony Rojas created YARN-3566:
---

 Summary: YARN Scheduler Web UI not properly sorting through 
Application ID or Progress bar
 Key: YARN-3566
 URL: https://issues.apache.org/jira/browse/YARN-3566
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.5.0
Reporter: Anthony Rojas


Noticed that the progress bar web UI component of the RM WebUI Cluster 
scheduler is not sorting at all, whereas the as the RM web UI main view is 
sortable.

The actual web URL that has the broken fields:
http://resource_manager.company.com:8088/cluster/scheduler

This URL however does have functional fields:
http://resource_manager.company.com:8088/cluster/apps

I'll attach a screenshot that shows which specific fields within the Web UI 
table that aren't sorting when clicked on.

Clicking either the Progress Bar column or the Application ID column from 
/cluster/scheduler did not trigger any changes at all;  Shouldn't it have 
sorted through ascending or descending of the jobs based on Application ID or 
through the actual progress from the Progress bar?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522066#comment-14522066
 ] 

Thomas Graves commented on YARN-3243:
-

It might to merge completely clean but it wouldn't require it for 
functionality.   It would be nice to have this in 2.7 either way though.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522066#comment-14522066
 ] 

Thomas Graves edited comment on YARN-3243 at 4/30/15 7:02 PM:
--

It might to merge completely clean but it wouldn't require it for 
functionality.   It would be nice to have this in 2.7 either way though.

I'll try it out later and see.


was (Author: tgraves):
It might to merge completely clean but it wouldn't require it for 
functionality.   It would be nice to have this in 2.7 either way though.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522241#comment-14522241
 ] 

Wangda Tan commented on YARN-3243:
--

[~tgraves], I just merged this to branch-2.7.

 CapacityScheduler should pass headroom from parent to children to make sure 
 ParentQueue obey its capacity limits.
 -

 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, 
 YARN-3243.4.patch, YARN-3243.5.patch


 Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
 its capacity limits, for example:
 1) When allocating container of a parent queue, it will only check 
 parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size 
  (parentQueue.max - parentQueue.usage), parent queue can excess its max 
 resource limit, as following example:
 {code}
 A  (usage=54, max=55)
/ \
   A1 A2 (usage=1, max=55)
 (usage=53, max=53)
 {code}
 Queue-A2 is able to allocate container since its usage  max, but if we do 
 that, A's usage can excess A.max.
 2) When doing continous reservation check, parent queue will only tell 
 children you need unreserve *some* resource, so that I will less than my 
 maximum resource, but it will not tell how many resource need to be 
 unreserved. This may lead to parent queue excesses configured maximum 
 capacity as well.
 With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
 *here is my proposal*:
 - ParentQueue will set its children's ResourceUsage.headroom, which means, 
 *maximum resource its children can allocate*.
 - ParentQueue will set its children's headroom to be (saying parent's name is 
 qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
 ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
 parent).
 - {{needToUnReserve}} is not necessary, instead, children can get how much 
 resource need to be unreserved to keep its parent's resource limit.
 - More over, with this, YARN-3026 will make a clear boundary between 
 LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-30 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3539:
-
Attachment: YARN-3539-006.patch

Patch -006 also uprates all of the data structures and {{TimelineClient}} from 
{{@Unstable}} to {{@Evolving}}

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, 
 YARN-3539-006.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522083#comment-14522083
 ] 

Hadoop QA commented on YARN-3539:
-

(!) The patch artifact directory on has been removed! 
This is a fatal error for test-patch.sh.  Aborting. 
Jenkins (node H3) information at 
https://builds.apache.org/job/PreCommit-YARN-Build/7558/ may provide some hints.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, 
 YARN-3539-006.patch


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-04-30 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522133#comment-14522133
 ] 

Craig Welch commented on YARN-1680:
---

Hi [~airbots], any luck on this?  Do you mind if I take it on again?

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3566) YARN Scheduler Web UI not properly sorting through Application ID or Progress bar

2015-04-30 Thread Anthony Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Rojas updated YARN-3566:

Attachment: Screen Shot 2015-04-30 at 1.23.56 PM.png

 YARN Scheduler Web UI not properly sorting through Application ID or Progress 
 bar
 -

 Key: YARN-3566
 URL: https://issues.apache.org/jira/browse/YARN-3566
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.5.0
Reporter: Anthony Rojas
 Attachments: Screen Shot 2015-04-30 at 1.23.56 PM.png


 Noticed that the progress bar web UI component of the RM WebUI Cluster 
 scheduler is not sorting at all, whereas the as the RM web UI main view is 
 sortable.
 The actual web URL that has the broken fields:
 http://resource_manager.company.com:8088/cluster/scheduler
 This URL however does have functional fields:
 http://resource_manager.company.com:8088/cluster/apps
 I'll attach a screenshot that shows which specific fields within the Web UI 
 table that aren't sorting when clicked on.
 Clicking either the Progress Bar column or the Application ID column from 
 /cluster/scheduler did not trigger any changes at all;  Shouldn't it have 
 sorted through ascending or descending of the jobs based on Application ID or 
 through the actual progress from the Progress bar?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3534) Collect node resource utilization

2015-04-30 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3534:
--
Attachment: YARN-3534-4.patch

Isolating the changes for the NodeResourceMonitorImpl. The aggregationg of data 
and sending to the RM will be done in another JIRA.

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch, YARN-3534-4.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522180#comment-14522180
 ] 

Wangda Tan commented on YARN-3521:
--

[~sunilg], Thanks for updating, just tried it locally, some comments:
1) it seems the structure of REST response is not correct for NodeLabelsInfo:
{code}
nodeLabelsInfo
nodeLabelsInfo
namex/name
exclusitytrue/exclusity
/nodeLabelsInfo
nodeLabelsInfo
namey/name
exclusitytrue/exclusity
/nodeLabelsInfo
/nodeLabelsInfo
{code}

It should be {{nodeLabelInfo}} instead of {{nodeLabelsInfo}}, could you solve 
this issue?

2) It's better to add a test for specifying exclusivity when adding node 
labels. (Verify exclusivity added to NodeLabelsManager).



 Support return structured NodeLabel objects in REST API when call 
 getClusterNodeLabels
 --

 Key: YARN-3521
 URL: https://issues.apache.org/jira/browse/YARN-3521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch


 In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
 make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522271#comment-14522271
 ] 

Jian He commented on YARN-3546:
---

bq. Let's consider below situation,
Hi [~sandflee], it's a valid situation. But waitForSchedulerAppAttemptAdded is 
really just a test utility method, it's not used in any production code. The 
whole MockRM is used for testing only.

 AbstractYarnScheduler.getApplicationAttempt seems misleading,  and there're 
 some misuse of it
 -

 Key: YARN-3546
 URL: https://issues.apache.org/jira/browse/YARN-3546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee

 I'm not familiar with scheduler,  with first eyes, I thought this func 
 returns the schdulerAppAttempt info corresponding to appAttemptId, but 
 actually it returns the current schdulerAppAttempt.
 It seems misled others too, such as
 TestWorkPreservingRMRestart.waitForNumContainersToRecover
 MockRM.waitForSchedulerAppAttemptAdded
 should I rename it to T getCurrentSchedulerApplicationAttempt(ApplicationId 
 applicationid)
 or returns null  if current attempt id not equals to the request attempt id ?
 comment preferred!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522060#comment-14522060
 ] 

Vinod Kumar Vavilapalli commented on YARN-3445:
---

bq. Jian He, can you file a ticket please?
Actually, we can do this too on YARN-3505 as that is related to 
LogAggregationReport. Please leave a comment there..

 Cache runningApps in RMNode for getting running apps on given NodeId
 

 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3445-v2.patch, YARN-3445.patch


 Per discussion in YARN-3334, we need filter out unnecessary collectors info 
 from RM in heartbeat response. Our propose is to add cache for runningApps in 
 RMNode, so RM only send collectors for local running apps back. This is also 
 needed in YARN-914 (graceful decommission) that if no running apps in NM 
 which is in decommissioning stage, it will get decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522141#comment-14522141
 ] 

Hadoop QA commented on YARN-3521:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 19  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 29s | The applied patch generated  3 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  63m 37s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 104m 45s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729530/0002-YARN-3521.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e2e8f77 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7557/artifact/patchprocess/whitespace.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7557/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7557/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7557/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7557/console |


This message was automatically generated.

 Support return structured NodeLabel objects in REST API when call 
 getClusterNodeLabels
 --

 Key: YARN-3521
 URL: https://issues.apache.org/jira/browse/YARN-3521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch


 In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
 make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-04-30 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522144#comment-14522144
 ] 

Chen He commented on YARN-1680:
---

Sure, I just assigned back to you. I may not have free cycle recently. 

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Craig Welch
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-04-30 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1680:
--
Assignee: Craig Welch  (was: Chen He)

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Craig Welch
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2369) Environment variable handling assumes values should be appended

2015-04-30 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated YARN-2369:
--
Attachment: YARN-2369-2.patch

New patch has a configurable whitelist for variables with append enabled as 
yarn.application.variables.with.append.  The existing unit tests are passing 
from TestMRApps.  Next up, a test to try to append to a variable not on the 
default white list and verify it gets replaced instead of appended to.  
[~jlowe], any thoughts on things that are missing from this latest patch or 
problems with the design?

 Environment variable handling assumes values should be appended
 ---

 Key: YARN-2369
 URL: https://issues.apache.org/jira/browse/YARN-2369
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Dustin Cote
 Attachments: YARN-2369-1.patch, YARN-2369-2.patch


 When processing environment variables for a container context the code 
 assumes that the value should be appended to any pre-existing value in the 
 environment.  This may be desired behavior for handling path-like environment 
 variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
 non-intuitive and harmful way to handle any variable that does not have 
 path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3551) Consolidate data model change according to the backend implementation

2015-04-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522744#comment-14522744
 ] 

Zhijie Shen commented on YARN-3551:
---

Created a new patch to change timeline metric APIs according to the 
aforementioned comments. In addition, change the collection to TreeMap, and 
sort the data points according to timestamp in descending order.

 Consolidate data model change according to the backend implementation
 -

 Key: YARN-3551
 URL: https://issues.apache.org/jira/browse/YARN-3551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3551-YARN-2928.4.patch, YARN-3551.1.patch, 
 YARN-3551.2.patch, YARN-3551.3.patch


 Based on the comments on 
 [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
  and 
 [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098],
  we need to change the data model to restrict the data type of 
 info/config/metric section.
 1. Info: the value could be all kinds object that is able to be 
 serialized/deserialized by jackson.
 2. Config: the value will always be assumed as String.
 3. Metric: single data or time series value have to be number for aggregation.
 Other than that, info/start time/finish time of metric seem not to be 
 necessary for storage. They should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522766#comment-14522766
 ] 

Vinod Kumar Vavilapalli commented on YARN-3481:
---

bq. Vinod Kumar Vavilapalli, it looks like YARN-2965 is very similar to this. 
Actually, this also looks like a clone to YARN-1012. Anyway, from what I 
understand, those JIRAs want to send utilization metrics in the heartbeat and 
that's pretty much what I'm targeting here. My current prototype extends 
ContainersMonitorImpl and puts this information into the NodeHealthStatus. I 
think I could do that in any of those JIRAs. 
Okay, I am going to assign YARN-1012 to you and close this as dup. Will also 
make YARN-3534 a sub-task of YARN-1011.

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-3481
 URL: https://issues.apache.org/jira/browse/YARN-3481
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
   Original Estimate: 336h
  Remaining Estimate: 336h

 To allow the RM take better scheduling decisions, it should be aware of the 
 actual utilization of the containers. The NM would aggregate the 
 ContainerMetrics and report it in every heartbeat.
 Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-3481.
---
Resolution: Duplicate

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-3481
 URL: https://issues.apache.org/jira/browse/YARN-3481
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
   Original Estimate: 336h
  Remaining Estimate: 336h

 To allow the RM take better scheduling decisions, it should be aware of the 
 actual utilization of the containers. The NM would aggregate the 
 ContainerMetrics and report it in every heartbeat.
 Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522774#comment-14522774
 ] 

Vinod Kumar Vavilapalli commented on YARN-3332:
---

Unfortunately, other pieces starting moving in sooner than I could start on 
this: YARN-3534 (in progress), YARN-3334 (part of Timeline service next-gen 
YARN-2928). So I am planning to do a refactor once those two go into trunk.

Tx for offering involvement, once they go in, I can file sub-tasks for moving 
forward.

 [Umbrella] Unified Resource Statistics Collection per node
 --

 Key: YARN-3332
 URL: https://issues.apache.org/jira/browse/YARN-3332
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: Design - UnifiedResourceStatisticsCollection.pdf


 Today in YARN, NodeManager collects statistics like per container resource 
 usage and overall physical resources available on the machine. Currently this 
 is used internally in YARN by the NodeManager for only a limited usage: 
 automatically determining the capacity of resources on node and enforcing 
 memory usage to what is reserved per container.
 This proposal is to extend the existing architecture and collect statistics 
 for usage b​eyond​ the existing use­cases.
 Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522790#comment-14522790
 ] 

Vinod Kumar Vavilapalli commented on YARN-3044:
---

Apologies for dropping off the send info from RM vs from NM discussion mid 
way through.

We all agree that sending information from NMs is *more scalable*.

The concern isn't really about information ownership. RM and NM both form the 
platform, so we can rely on NMs to publish information. But it's really about 
potential *loss of information* in many not-so rare cases like when container 
gets allocated but gets preempted or released by AM before it really starts.

As long as containers successfully start on NMs (which will be the vast 
majority assuming the cluster isn't bad), we can rely on NMs to post all sorts 
of information - allocation time, wait time, execution time, information like 
priority, host, port , resource-usage-over-time etc. We can just tunnel some of 
the RM-originated information through AMs to the NM.

The missing dots occur when a container's life-cycle ends either on the RM or 
the AM. We can take a dual pronged approach here? That or we make the 
RM-publisher itself a distributed push.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044-YARN-2928.004.patch, 
 YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
 YARN-3044.20150416-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2619) NodeManager: Add cgroups support for disk I/O isolation

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522754#comment-14522754
 ] 

Vinod Kumar Vavilapalli commented on YARN-2619:
---

Looks good to me too, +1. Checking this in..

 NodeManager: Add cgroups support for disk I/O isolation
 ---

 Key: YARN-2619
 URL: https://issues.apache.org/jira/browse/YARN-2619
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2619-1.patch, YARN-2619.002.patch, 
 YARN-2619.003.patch, YARN-2619.004.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522779#comment-14522779
 ] 

Vinod Kumar Vavilapalli commented on YARN-2942:
---

That will definitely simplify things a lot more IMO, we will no longer need a 
ZK dependency on core of YARN (outside if HA).

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CombinedAggregatedLogsProposal_v6.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, 
 ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2123) Progress bars in Web UI always at 100% due to non-US locale

2015-04-30 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522781#comment-14522781
 ] 

Tsuyoshi Ozawa commented on YARN-2123:
--

[~ajisakaa] maybe did you forget attaching the v4 patch?

 Progress bars in Web UI always at 100% due to non-US locale
 ---

 Key: YARN-2123
 URL: https://issues.apache.org/jira/browse/YARN-2123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.3.0
Reporter: Johannes Simon
Assignee: Akira AJISAKA
 Attachments: NaN_after_launching_RM.png, YARN-2123-001.patch, 
 YARN-2123-002.patch, YARN-2123-003.patch, fair-scheduler-ajisaka.xml, 
 screenshot-noPatch.png, screenshot-patch.png, screenshot.png, 
 yarn-site-ajisaka.xml


 In our cluster setup, the YARN web UI always shows progress bars at 100% (see 
 screenshot, progress of the reduce step is roughly at 32.82%). I opened the 
 HTML source code to check (also see screenshot), and it seems the problem is 
 that it uses a comma as decimal mark, where most browsers expect a dot for 
 floating-point numbers. This could possibly be due to localized number 
 formatting being used in the wrong place, which would also explain why this 
 bug is not always visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3551) Consolidate data model change according to the backend implementation

2015-04-30 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3551:
--
Attachment: YARN-3551-YARN-2928.4.patch

 Consolidate data model change according to the backend implementation
 -

 Key: YARN-3551
 URL: https://issues.apache.org/jira/browse/YARN-3551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3551-YARN-2928.4.patch, YARN-3551.1.patch, 
 YARN-3551.2.patch, YARN-3551.3.patch


 Based on the comments on 
 [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
  and 
 [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098],
  we need to change the data model to restrict the data type of 
 info/config/metric section.
 1. Info: the value could be all kinds object that is able to be 
 serialized/deserialized by jackson.
 2. Config: the value will always be assumed as String.
 3. Metric: single data or time series value have to be number for aggregation.
 Other than that, info/start time/finish time of metric seem not to be 
 necessary for storage. They should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Collect node resource utilization

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522784#comment-14522784
 ] 

Hadoop QA commented on YARN-3534:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | reexec |   0m  0s | dev-support patch detected. |
| {color:blue}0{color} | pre-patch |  14m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:blue}0{color} | @author |   0m  0s | Skipping @author checks as 
test-patch has been patched. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   6m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   7m 25s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |  22m 41s | The applied patch generated 
364 release audit warnings. |
| {color:red}-1{color} | checkstyle |   4m 26s | The applied patch generated  2 
 additional checkstyle issues. |
| {color:blue}0{color} | shellcheck |   4m 26s | Shellcheck was not available. |
| {color:green}+1{color} | install |   1m 12s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 30s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  4s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  23m  7s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   5m 47s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  90m 58s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729676/YARN-3534-5.patch |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | trunk / 98a6176 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/console |


This message was automatically generated.

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3551) Consolidate data model change according to the backend implementation

2015-04-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522785#comment-14522785
 ] 

Sangjin Lee commented on YARN-3551:
---

The latest patch looks good. Thanks for addressing the feedback and updating 
the patch [~zjshen]!

There is one nit. I would prefer method and field names that don't include 
timeSeries in them, as the metric can be used for either a single value or 
time series.

How about the following? timeSeries = values, getTimeSeriesJAXB() = 
getValuesJAXB(), getTimeSeries() = getValues(), setTimeSeries() = 
setValues(), addTimeSeries() = addValues(), addTimeSeriesData() = addValue()

 Consolidate data model change according to the backend implementation
 -

 Key: YARN-3551
 URL: https://issues.apache.org/jira/browse/YARN-3551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3551-YARN-2928.4.patch, YARN-3551.1.patch, 
 YARN-3551.2.patch, YARN-3551.3.patch


 Based on the comments on 
 [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
  and 
 [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098],
  we need to change the data model to restrict the data type of 
 info/config/metric section.
 1. Info: the value could be all kinds object that is able to be 
 serialized/deserialized by jackson.
 2. Config: the value will always be assumed as String.
 3. Metric: single data or time series value have to be number for aggregation.
 Other than that, info/start time/finish time of metric seem not to be 
 necessary for storage. They should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3534) Collect node resource utilization

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3534:
--
Issue Type: Sub-task  (was: New Feature)
Parent: YARN-1011

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Collect node resource utilization

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522768#comment-14522768
 ] 

Vinod Kumar Vavilapalli commented on YARN-3534:
---

Converting this to be a sub-task of YARN-1011. See my last comment at YARN-3481.

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1012:
--
Assignee: Inigo Goiri  (was: Vinod Kumar Vavilapalli)

YARN-3481 is closed as dup of this JIRA.

Assigning this one to [~elgoiri] who was moving forward on that JIRA.

 NM should report resource utilization of running containers to RM in heartbeat
 --

 Key: YARN-1012
 URL: https://issues.apache.org/jira/browse/YARN-1012
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Inigo Goiri





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522325#comment-14522325
 ] 

Hadoop QA commented on YARN-2369:
-

(!) The patch artifact directory on has been removed! 
This is a fatal error for test-patch.sh.  Aborting. 
Jenkins (node H4) information at 
https://builds.apache.org/job/PreCommit-YARN-Build/7559/ may provide some hints.

 Environment variable handling assumes values should be appended
 ---

 Key: YARN-2369
 URL: https://issues.apache.org/jira/browse/YARN-2369
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Dustin Cote
 Attachments: YARN-2369-1.patch, YARN-2369-2.patch


 When processing environment variables for a container context the code 
 assumes that the value should be appended to any pre-existing value in the 
 environment.  This may be desired behavior for handling path-like environment 
 variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
 non-intuitive and harmful way to handle any variable that does not have 
 path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API

2015-04-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522418#comment-14522418
 ] 

Zhijie Shen commented on YARN-3539:
---

We perhaps need to mark generic history APIs related classes/methods stable 
too, or we want to exclude them in this jira. Those classes are 
ApplicationBaseProtocol, YarnClient, ApplicationReport, 
ApplicationAttemptReport and ContainerReport.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 YARN-3539-003.patch, YARN-3539-004.patch, YARN-3539-005.patch, 
 YARN-3539-006.patch, timeline_get_api_examples.txt


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3546) AbstractYarnScheduler.getApplicationAttempt seems misleading, and there're some misuse of it

2015-04-30 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522456#comment-14522456
 ] 

sandflee commented on YARN-3546:


ok, close it now, thanks [~jianhe]

 AbstractYarnScheduler.getApplicationAttempt seems misleading,  and there're 
 some misuse of it
 -

 Key: YARN-3546
 URL: https://issues.apache.org/jira/browse/YARN-3546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee

 I'm not familiar with scheduler,  with first eyes, I thought this func 
 returns the schdulerAppAttempt info corresponding to appAttemptId, but 
 actually it returns the current schdulerAppAttempt.
 It seems misled others too, such as
 TestWorkPreservingRMRestart.waitForNumContainersToRecover
 MockRM.waitForSchedulerAppAttemptAdded
 should I rename it to T getCurrentSchedulerApplicationAttempt(ApplicationId 
 applicationid)
 or returns null  if current attempt id not equals to the request attempt id ?
 comment preferred!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522569#comment-14522569
 ] 

Sangjin Lee commented on YARN-3411:
---

+1 on 1.0.1.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522556#comment-14522556
 ] 

Li Lu commented on YARN-3411:
-

Awesome, thanks! 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-04-30 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1462:

Attachment: YARN-1462-branch-2.7-1.patch

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, 
 YARN-1462.2.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-04-30 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1462:

Attachment: YARN-1462.2.patch

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, 
 YARN-1462.2.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522678#comment-14522678
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 48s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  26m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729627/YARN-3134-YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b689f5d |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7562/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7562/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7562/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3534) Collect node resource utilization

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522724#comment-14522724
 ] 

Hadoop QA commented on YARN-3534:
-

(!) A patch to test-patch or smart-apply-patch has been detected. 
Re-executing against the patched versions to perform further tests. 
The console is at 
https://builds.apache.org/job/PreCommit-YARN-Build/7564/console in case of 
problems.

 Collect node resource utilization
 -

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, 
 YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the NodeResourceMonitor and 
 send this information to the Resource Manager in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-04-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522737#comment-14522737
 ] 

zhihai xu commented on YARN-2893:
-

TestContainerAllocation failure is not related to my change, it is just fixed 
at YARN-3564.
Also this checkstyle issue may be caused by the import statement:
{code}
import org.apache.hadoop.classification.InterfaceAudience.Private;
{code}
but this import statement doesn't look like an issue for me.
I found similar checkstyle issue at MAPREDUCE-6339, which was caused by the 
import statement.
Hi [~jira.shegalov], Do you want me to do the same experiment as MAPREDUCE-6339 
to prove the import statement cause this checkstyle issue?

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, 
 YARN-2893.005.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-30 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522639#comment-14522639
 ] 

Jun Gong commented on YARN-3366:


[~sidharta-s] Thank you for the explanation. Could we set YARNRootClass's ceil 
rate to yarnBandwidthMbit when 'strictMode'(which is configured through 
YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE) is set to 
true, otherwise set it to rootBandwidthMbit? If cluster admins require to set a 
maximum bandwidth for YARN, he/she could set strictMode to true.

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
 YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
 YARN-3366.006.patch, YARN-3366.007.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-04-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522672#comment-14522672
 ] 

Xuan Gong commented on YARN-1462:
-

Address all the latest comments. And created a patch for branch-2.7

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, 
 YARN-1462.2.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522730#comment-14522730
 ] 

Hadoop QA commented on YARN-1462:
-

(!) The patch artifact directory has been removed! 
This is a fatal error for test-patch.sh.  Aborting. 
Jenkins (node H3) information at 
https://builds.apache.org/job/PreCommit-YARN-Build/7566/ may provide some hints.

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, 
 YARN-1462.2.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-30 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3411:
-
Attachment: YARN-3411.poc.4.txt

Hi [~gtCarrera9]

Yes, sure we can use hbase 1.0.1.  Attaching an updated patch that uses hbase 
1.0.1. It works fine for the unit test.

We will also have the hbase cluster set up with version 1.0.1.

thanks
Vrushali

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-30 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522642#comment-14522642
 ] 

Robert Kanter commented on YARN-2942:
-

Thanks for pointing me to YARN-1376 and related.  I'll have to look into the 
code to get a better idea, but perhaps we can take advantage of this to do a 
completely different approach for combining the logs.  Now that we have a way 
of checking the status of log aggregation across all nodes in the cluster, 
instead of having to use ZK locks to coordinate all the NMs to append the logs, 
we can have a single server append the logs (maybe a small thread pool in the 
RM that handles this?).  We'd still use append, and the new format, but we 
wouldn't need to use ZooKeeper, and using a single Server to do the combining 
should simplify things.  We'd probably need to add a new 
{{LogAggregationStatus}} enums for COMBINING and COMBINED or something.  
I'll look into this some more, though what do you think [~vinodkv], [~jlowe], 
[~knoguchi]?

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CombinedAggregatedLogsProposal_v6.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, 
 ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >