date:20140715


[ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061777#comment-14061777
 ] 

Hadoop QA commented on YARN-2270:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655703/YARN-2270.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4302//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4302//console

This message is automatically generated.

> TestFSDownload#testDownloadPublicWithStatCache fails in trunk
> -
>
> Key: YARN-2270
> URL: https://issues.apache.org/jira/browse/YARN-2270
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.4.1
>Reporter: Ted Yu
>Assignee: Akira AJISAKA
>Priority: Minor
> Attachments: YARN-2270.patch
>
>
> From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
> {code}
> Running org.apache.hadoop.yarn.util.TestFSDownload
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
> testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
> Time elapsed: 0.137 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
> {code}
> Similar error can be seen here: 
> https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
> Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: (was: Yarn-1408.11.patch)

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
> Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
> Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.11.patch

Test case failures are in webapp and its due to connection bind exception.
I corrected visibility as mentioned by [~jianhe]. Attaching patch again to 
re-run test cases.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
> Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
> Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2287) Add audit log levels for NM and RM

Varun Saxena created YARN-2287:
--

 Summary: Add audit log levels for NM and RM
 Key: YARN-2287
 URL: https://issues.apache.org/jira/browse/YARN-2287
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.4.1
Reporter: Varun Saxena


NM and RM audit logging can be done based on log level as some of the audit 
logs, especially the container audit logs appear too many times. By introducing 
log level, certain audit logs can be suppressed, if not required in deployment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: ProposalofStoringYARNMetricsintotheTimelineStore.pdf

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061873#comment-14061873
 ] 

Zhijie Shen commented on YARN-2033:
---

Reassign the ticket to myself.

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-2033:
-

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061872#comment-14061872
 ] 

Hadoop QA commented on YARN-1408:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655718/Yarn-1408.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4303//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4303//console

This message is automatically generated.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
> Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
> Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033.Prototype.patch

Upload the proposal of changes and the demo code.

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.Prototype.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061889#comment-14061889
 ] 

Zhijie Shen commented on YARN-2033:
---

bq.  I would think the timeline store might have to supporting storing a lot 
more information than the history store. In that case, one might want to keep 
them separate?

IMHO, it depends on the use cases. Either generic or the application specific 
metrics can be a lot, or both. The problem to keep them separately is two have 
two different set of interfaces, which double our effort of maintenance and 
upgrade. For example, we've done data retention, and caching for the 
Leveldb-based timeline store. However, generic history cannot taken advantage 
of it unless we implement the similar features for the application history 
store again.

I don't think keeping both metrics sets into the same store will be a big 
trouble for each other. With the uniformed the store interface, we can restrict 
the effort of improving the store implementation, where can isolate the two 
metrics sets (e.g., storing in two tables).

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.Prototype.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061891#comment-14061891
 ] 

Junping Du commented on YARN-1341:
--

Hey [~jlowe], I also agree it is better to discuss the inconsistent scenario 
for each cases on separated JIRAs. However, for now, our conclusion from these 
discussions can only be true in theoretically but it may have bugs/issues in 
practical. Thus, I also suggest we should have a central place to document 
these assumptions/conclusions from discussions and it would help us and others 
in community to identify potential issues if coming up with UT or other 
integration tests on negative cases later. What do you think? If you are also 
agree on this, we can separate this document effort to other JIRA (Umbrella or 
a dedicated one, whatever you like) and continue the discussion on this 
particular case.
On this particular one, the assumptions here from discussion above seems like: 
if NM restart with stale keys, 
a. if currentMasterKey is stale, it can be updated and override soon with 
registering to RM later. Nothing is affected.
b. if previousMasterKey is stale, then the real previous master key is lost, so 
the affection is: AMs with real master key cannot connect to NM to launch 
containers.
c. if applicationMasterKeys are stale, then previous old keys get tracked in 
applicationMasterKeys get lost after restart. The affection is: AMs with old 
keys cannot connect to NM to launch containers.
I would prefer option 1 too if we listed all affections here. Anything I am 
missing here?

> Recover NMTokens upon nodemanager restart
> -
>
> Key: YARN-1341
> URL: https://issues.apache.org/jira/browse/YARN-1341
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
> YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2288) Data persistent in timelinestore should be versioned

Junping Du created YARN-2288:


 Summary: Data persistent in timelinestore should be versioned
 Key: YARN-2288
 URL: https://issues.apache.org/jira/browse/YARN-2288
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du


We have LevelDB-backed TimelineStore, it should have schema version for changes 
in schema in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2289) ApplicationHistoryStore should be versioned


 [ 
https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2289:
-

Component/s: applications

> ApplicationHistoryStore should be versioned
> ---
>
> Key: YARN-2289
> URL: https://issues.apache.org/jira/browse/YARN-2289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2289) ApplicationHistoryStore should be versioned

Junping Du created YARN-2289:


 Summary: ApplicationHistoryStore should be versioned
 Key: YARN-2289
 URL: https://issues.apache.org/jira/browse/YARN-2289
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2289) ApplicationHistoryStore should be versioned


[ 
https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061905#comment-14061905
 ] 

Junping Du commented on YARN-2289:
--

Generic History Server is being refactored to be based on TimelineStore.

> ApplicationHistoryStore should be versioned
> ---
>
> Key: YARN-2289
> URL: https://issues.apache.org/jira/browse/YARN-2289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications
>Reporter: Junping Du
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2256) Too many nodemanager audit logs are generated


[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061907#comment-14061907
 ] 

Varun Saxena commented on YARN-2256:


Adding of log levels in RM and NM is addressed by this issue

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2256) Too many nodemanager audit logs are generated


[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061908#comment-14061908
 ] 

Varun Saxena commented on YARN-2256:


Changed NM container logs to debug level so that they don't appear in audit 
logs by default.

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2256) Too many nodemanager audit logs are generated


 [ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2256:
---

Attachment: YARN-2256.patch

Please review the patch

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2287) Add audit log levels for NM and RM


[ 
https://issues.apache.org/jira/browse/YARN-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061919#comment-14061919
 ] 

Varun Saxena commented on YARN-2287:


I will make the following changes :
1. Create new logSuccess and logFailure methods having an additional parameter 
indicating log level. This can be an enum in RMAuditLogger and NMAuditLogger.
2. The previous logSuccess method will continue printing logs at INFO level. 
The new method can be used to print logs at appropriate levels.

> Add audit log levels for NM and RM
> --
>
> Key: YARN-2287
> URL: https://issues.apache.org/jira/browse/YARN-2287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.1
>Reporter: Varun Saxena
>
> NM and RM audit logging can be done based on log level as some of the audit 
> logs, especially the container audit logs appear too many times. By 
> introducing log level, certain audit logs can be suppressed, if not required 
> in deployment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2256) Too many nodemanager audit logs are generated


[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061922#comment-14061922
 ] 

Hadoop QA commented on YARN-2256:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655734/YARN-2256.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4304//console

This message is automatically generated.

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple


[ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061930#comment-14061930
 ] 

Hudson commented on YARN-2228:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/613/])
YARN-2228. Augmented TimelineServer to load pseudo authentication filter when 
authentication = simple. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610575)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/ForbiddenException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/GenericExceptionHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestMemoryApplicationHistoryStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java


> TimelineServer should load pseudo authentication filter when authentication = 
> simple
> 
>
> Key: YARN-2228
> URL: https://issues.apache.org/jira/browse/YARN-2228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch, 
> YARN-2228.4.patch, YARN-2228.5.patch, YARN-2228.6.patch
>
>
> When kerberos authentication is not enabled, we should let the timeline 
> server to work with pseudo authentication filter. In this way, the sever is 
> able to detect the request user by checking "user.name".
> On the other hand, timeline client should append "user.name" in un-secure 
> case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2260) Add containers to launchedContainers list in RMNode on container recovery


[ 
https://issues.apache.org/jira/browse/YARN-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061932#comment-14061932
 ] 

Hudson commented on YARN-2260:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/613/])
YARN-2260. Fixed ResourceManager's RMNode to correctly remember containers when 
nodes resync during work-preserving RM restart. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610557)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> Add containers to launchedContainers list in RMNode on container recovery
> -
>
> Key: YARN-2260
> URL: https://issues.apache.org/jira/browse/YARN-2260
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2260.1.patch, YARN-2260.2.patch
>
>
> The justLaunchedContainers map in RMNode should be re-populated when 
> container is sent from NM for recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API

2014-07-15 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061959#comment-14061959
 ] 

Akira AJISAKA commented on YARN-1050:
-

Thanks [~kj-ki] for the update.
{code}
+"childQueues": {
+"clusterResources": {
+"memory": 8192, 
+"vCores": 8
{code}
'[' bracket is needed since "childQueues" is a collection.
Minor nit: There are some trailing whitespaces in the JSON response body.

> Document the Fair Scheduler REST API
> 
>
> Key: YARN-1050
> URL: https://issues.apache.org/jira/browse/YARN-1050
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
> Attachments: YARN-1050-2.patch, YARN-1050.patch
>
>
> The documentation should be placed here along with the Capacity Scheduler 
> documentation: 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2256) Too many nodemanager audit logs are generated


 [ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2256:
---

Attachment: (was: YARN-2256.patch)

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2256) Too many nodemanager audit logs are generated


 [ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2256:
---

Attachment: YARN-2256.patch

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2256) Too many nodemanager audit logs are generated


[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062009#comment-14062009
 ] 

Hadoop QA commented on YARN-2256:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655753/YARN-2256.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4305//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4305//console

This message is automatically generated.

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2152) Recover missing container information


[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062060#comment-14062060
 ] 

Jason Lowe commented on YARN-2152:
--

Yeah that's what I suspected as well, but I wanted to mention it in case I 
missed something.  It's crucial we get the token compatibility sorted out 
sooner rather than later, otherwise I can see us regularly breaking 
compatibility between even minor versions as we tweak tokens to add features.  
Whenever that happens rolling upgrades will not work in practice.

> Recover missing container information
> -
>
> Key: YARN-2152
> URL: https://issues.apache.org/jira/browse/YARN-2152
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.5.0
>
> Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch, 
> YARN-2152.3.patch
>
>
> Container information such as container priority and container start time 
> cannot be recovered because NM container today lacks such container 
> information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2260) Add containers to launchedContainers list in RMNode on container recovery


[ 
https://issues.apache.org/jira/browse/YARN-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062075#comment-14062075
 ] 

Hudson commented on YARN-2260:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/])
YARN-2260. Fixed ResourceManager's RMNode to correctly remember containers when 
nodes resync during work-preserving RM restart. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610557)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> Add containers to launchedContainers list in RMNode on container recovery
> -
>
> Key: YARN-2260
> URL: https://issues.apache.org/jira/browse/YARN-2260
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2260.1.patch, YARN-2260.2.patch
>
>
> The justLaunchedContainers map in RMNode should be re-populated when 
> container is sent from NM for recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple


[ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062073#comment-14062073
 ] 

Hudson commented on YARN-2228:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/])
YARN-2228. Augmented TimelineServer to load pseudo authentication filter when 
authentication = simple. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610575)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/ForbiddenException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/GenericExceptionHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestMemoryApplicationHistoryStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java


> TimelineServer should load pseudo authentication filter when authentication = 
> simple
> 
>
> Key: YARN-2228
> URL: https://issues.apache.org/jira/browse/YARN-2228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch, 
> YARN-2228.4.patch, YARN-2228.5.patch, YARN-2228.6.patch
>
>
> When kerberos authentication is not enabled, we should let the timeline 
> server to work with pseudo authentication filter. In this way, the sever is 
> able to detect the request user by checking "user.name".
> On the other hand, timeline client should append "user.name" in un-secure 
> case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2256) Too many nodemanager audit logs are generated


[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062079#comment-14062079
 ] 

Varun Saxena commented on YARN-2256:


Only changed log level from INFO to DEBUG. No tests need to be included.
Tested the flows manually where the audit log changed appears.

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple


[ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062171#comment-14062171
 ] 

Hudson commented on YARN-2228:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/])
YARN-2228. Augmented TimelineServer to load pseudo authentication filter when 
authentication = simple. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610575)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/ForbiddenException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/GenericExceptionHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestMemoryApplicationHistoryStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java


> TimelineServer should load pseudo authentication filter when authentication = 
> simple
> 
>
> Key: YARN-2228
> URL: https://issues.apache.org/jira/browse/YARN-2228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch, 
> YARN-2228.4.patch, YARN-2228.5.patch, YARN-2228.6.patch
>
>
> When kerberos authentication is not enabled, we should let the timeline 
> server to work with pseudo authentication filter. In this way, the sever is 
> able to detect the request user by checking "user.name".
> On the other hand, timeline client should append "user.name" in un-secure 
> case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2260) Add containers to launchedContainers list in RMNode on container recovery


[ 
https://issues.apache.org/jira/browse/YARN-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062173#comment-14062173
 ] 

Hudson commented on YARN-2260:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/])
YARN-2260. Fixed ResourceManager's RMNode to correctly remember containers when 
nodes resync during work-preserving RM restart. Contributed by Jian He. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610557)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


> Add containers to launchedContainers list in RMNode on container recovery
> -
>
> Key: YARN-2260
> URL: https://issues.apache.org/jira/browse/YARN-2260
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2260.1.patch, YARN-2260.2.patch
>
>
> The justLaunchedContainers map in RMNode should be re-populated when 
> container is sent from NM for recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

[
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062212#comment-14062212
]

Jason Lowe commented on YARN-2045:
--

bq. I agree the concept is not quite the same but I tend to handle them both
together as either of change (protobuf schema or layout schema) will bring
difficulty/risky for NMStateStoreService to load old version of data.

I think lumping them together and handling them in the implementation-specific
code is fine, but if the implementation is handling all the details then why is
it exposed in the interface? I think the most telling point is that in the
proposed patch no common code actually uses the interfaces that were added.
Each implementation does its own version setting, its own compatibility check,
and I assume its own marshaling in the future if necessary. The interfaces
aren't called by common code. Maybe I'm not seeing the future use case of
these methods? I guess it could be useful for common code to do
logging/reporting of the persisted/current versions or maybe to do a very
simplistic incompatibility check (e.g.: assume different major numbers means
incompatible), although arguably the implementation could simply log these
numbers as it initializes and is already doing an implementation-specific
compatibility check.

However I'm particularly doubtful of the storeVersion method as it seems like
the only way to safely convert versions in the general sense is with
implementation-specific code. Using the conversion pseudo-code above as an
example, if we crash halfway through the conversion of a series of objects then
we have a mix of old and new data on the next restart but the stored version
number is still old (or vice-versa if we store the new version first then
convert). In an implementation-specific approach it may be possible to make
the conversion atomic, e.g.: using a batch write for the entire conversion in
leveldb. Therefore it makes more sense to me that an implementation should be
responsible for deciding when and how to update the persisted schema version.
I would expect implementations to do this sort of conversion during
initialization and potentially the old persisted version would never be seen
since it would already be converted. Do you have an example where using the
storeVersion method in the interface via implementation-independent code would
be more appropriate and therefore the storeVersion method in the interface is
necessary?

To summarize, I can see exposing the ability to get the persisted and current
state store versions in the interface for logging, etc. However I don't see
how implementation-independent code can properly update the version via the
interface. We're lumping both interface and implementation-specific schema
changes in the same version number, and it isn't possible to do an update of
multiple store objects atomically via the current interface.

bq. Are you suggesting NMDBSchemaVersion to play as PBImpl directly to include
raw protobuf or something else?

Sort of a subset of what the PBImpl is doing. I was thinking of having
NMDBSchemaVersion wrap the protobuf but in a read-only way (i.e.: no set
methods, no builder stuff). If one wants to change the version number, create
a new protobuf. PBImpls tend to get into trouble because they can be written,
and it's simpler to treat the protobufs as immutable as they were intended.
Another approach would be to simply have some static helper util methods that
take two protobufs to do the compatible checks, etc. Although I don't think we
can really implement a useful isCompatibleTo check in implementaion-independent
code since the version numbers encodes implementation-specific schema
information.

Anyway I didn't mean to drag out this change for too long. I'm wondering about
these interfaces since I'm a strong believer that interfaces should be minimal
and necessary, and I'm having difficulty seeing how these interfaces are really
going to be used. However I'm probably in the minority on these methods. If
people feel strongly that these interfaces are necessary and useful then go
ahead and add them. It seems to me that these interfaces will either never be
called or only called for trivial reasons (e.g.: logging). However I don't
think having them is going to break anything or be an unreasonable burden on an
implementation, rather just extra baggage that state store implementations have
to expose. As for the PBImpl, it's mostly a nit. If you really would rather
keep it in I guess that's fine. We should be able to remove it later if we
realize we don't have a use for it. The main change I think has to be made is
the leveldb schema check should handle the original method for storing the
schema. Two ways to handle that are either explicitly check for the "1.0"
string before trying to parse the

[jira] [Updated] (YARN-2287) Add audit log levels for NM and RM


 [ 
https://issues.apache.org/jira/browse/YARN-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2287:
---

Attachment: YARN-2287.patch

Kindly review the patch

> Add audit log levels for NM and RM
> --
>
> Key: YARN-2287
> URL: https://issues.apache.org/jira/browse/YARN-2287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.1
>Reporter: Varun Saxena
> Attachments: YARN-2287.patch
>
>
> NM and RM audit logging can be done based on log level as some of the audit 
> logs, especially the container audit logs appear too many times. By 
> introducing log level, certain audit logs can be suppressed, if not required 
> in deployment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


 [ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2233:


Attachment: apache-yarn-2233.4.patch

{quote}
bq.It seems to me that all API implementations should take the fulll 
principle name if available.

I meant to replace all occurrences of getCallerUserGroupInformation(hsr), if 
that makes sense.
{quote}

Fixed this. Use the principal everywhere

{quote}
bq.We should set all the fields of a DT - token, renewer, expiration-time 
all the time - new-token, renew-token? renewDelegationToken only returns only 
the expiry-time and getToken only returns the token. This is consistent with 
RPCs. But I think in a followup, we should fix this.

Fixed.

bq. You meant we will fix this in a separate JIRA? I still see renewToken not 
returning the entire token info. I'm okay doing it separately, just clarifying 
what you said..
{quote}

I've fixed this for creating a new delegation token but I didn't fix it for 
renew token. I think it's ok to fix it as part of a seperate JIRA.

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-15 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062261#comment-14062261
 ] 

Eric Payne commented on YARN-415:
-

Hi [~leftnoteasy].

Thank you very much for reviewing my patch.

I think I understand what you are suggesting. Please let me clarify:
{quote}
1) Add memory utilization to RMAppMetrics/RMAppAttemptMetrics
{quote}
Since every RMAppAttemptImpl object has a reference to an RMAppAttemptMetrics 
object, you are suggesting that I move the resource usage stats to 
RMAppAttemptMetrics. Also, when reporting on resource usage, use the reporting 
methods from RMAppAttempt and RMApp.
{quote}
2) Keep running container resource utilization in SchedulerApplicationAttempt
{quote}
The way the patch for YARN-415 is currently, it keeps resource usage stats for 
both running and finished containers in the SchedulerApplicationAttempt object. 
You're suggestion is to keep resource usage stats only for running containers.
{quote}
3) Move completed container resource calculation to 
RMContainerImpl#FinishTransition
{quote}
For completed containers, you are suggesting that the calculation be done for 
final resource usage stats within the RMContainerImpl#FinishTransition method 
and have that send the resource stats as a payload within the 
RMAppAttemptContainerFinishedEvent event. Then, when RMAppAttemptImpl receives 
the CONTAINER_FINISHED event, it would add the resource usage stats for the 
finished containers to those already collected within the RMAppAttemptMetrics 
object.

Is that correct?

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062270#comment-14062270
 ] 

Varun Vasudev commented on YARN-2233:
-

[~tucu00] I'm going to file another ticket to migrate over to the hadoop-common 
implementation once you've committed the changes(and once support for passing 
tokens via headers is added).

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2290) Add support for passing delegation tokens via headers for web services

Varun Vasudev created YARN-2290:
---

 Summary: Add support for passing delegation tokens via headers for 
web services
 Key: YARN-2290
 URL: https://issues.apache.org/jira/browse/YARN-2290
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev


HADOOP-10799 refactors the WebHDFS code to handle delegation tokens a part of 
hadoop-common. We should add support to pass delegation tokens as a header 
instead of passing it as part of the url.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2291) Timeline and RM web services should use same authentication code

Varun Vasudev created YARN-2291:
---

 Summary: Timeline and RM web services should use same 
authentication code
 Key: YARN-2291
 URL: https://issues.apache.org/jira/browse/YARN-2291
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev


The TimelineServer and the RM web services have very similar requirements and 
implementation for authentication via delegation tokens apart from the fact 
that the RM web services requires delegation tokens to be passed as a header. 
They should use the same code base instead of different implementations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-15 Thread Milan Potocnik (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062281#comment-14062281
 ] 

Milan Potocnik commented on YARN-1994:
--

Both TestFSDownload and TestMemoryApplicationHistoryStore pass on my box and do 
not seem to be related to the change.

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
> YARN-1994.3.patch, YARN-1994.4.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2292) RM web services should use hadoop-common for authentication using delegation tokens

Varun Vasudev created YARN-2292:
---

 Summary: RM web services should use hadoop-common for 
authentication using delegation tokens
 Key: YARN-2292
 URL: https://issues.apache.org/jira/browse/YARN-2292
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev


HADOOP-10771 refactors the WebHDFS authentication code to hadoop-common. 
YARN-2290 will add support for passing delegation tokens via headers. Once 
support is added RM web services should use the authentication code from 
hadoop-common



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields


[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062331#comment-14062331
 ] 

Vinod Kumar Vavilapalli commented on YARN-668:
--

One other important point from the design doc at YARN-666 is to make sure that, 
during the upgrade, tokens are accepted by both the old and new NMs. We need 
some magic on the ResourceManager.

> TokenIdentifier serialization should consider Unknown fields
> 
>
> Key: YARN-668
> URL: https://issues.apache.org/jira/browse/YARN-668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Vinod Kumar Vavilapalli
>
> This would allow changing of the TokenIdentifier between versions. The 
> current serialization is Writable. A simple way to achieve this would be to 
> have a Proto object as the payload for TokenIdentifiers, instead of 
> individual fields.
> TokenIdentifier continues to implement Writable to work with the RPC layer - 
> but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2152) Recover missing container information


[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062332#comment-14062332
 ] 

Vinod Kumar Vavilapalli commented on YARN-2152:
---

Yeah, I just realized that YARN-668 already exists for this. I made a comment 
there to make sure we don't miss this. I was actively thinking about it, this 
is one of the big pending issues for rolling upgrades..

> Recover missing container information
> -
>
> Key: YARN-2152
> URL: https://issues.apache.org/jira/browse/YARN-2152
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.5.0
>
> Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch, 
> YARN-2152.3.patch
>
>
> Container information such as container priority and container start time 
> cannot be recovered because NM container today lacks such container 
> information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


 [ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2233:


Attachment: apache-yarn-2233.5.patch

Uploaded new patch fixing findbug error. The test case failures are due to 
TestClientRMService.testForceKillApplication failing which lead to a whole 
bunch of subsequent tests to fail.

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch, 
> apache-yarn-2233.5.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2100) Refactor the Timeline Server code for Kerberos + DT authentication


 [ 
https://issues.apache.org/jira/browse/YARN-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2100:
--

Summary: Refactor the Timeline Server code for Kerberos + DT authentication 
 (was: Refactor the code of Kerberos + DT authentication)

> Refactor the Timeline Server code for Kerberos + DT authentication
> --
>
> Key: YARN-2100
> URL: https://issues.apache.org/jira/browse/YARN-2100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> The customized Kerberos + DT authentication of the timeline server largely 
> refers to that of Http FS, therefore, there're a portion of duplicate code. 
> We should think about refactor the code if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2292) RM web services should use hadoop-common for authentication using delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062379#comment-14062379
 ] 

Vinod Kumar Vavilapalli commented on YARN-2292:
---

YARN-2100 is the related ticket for Timeline Service..

> RM web services should use hadoop-common for authentication using delegation 
> tokens
> ---
>
> Key: YARN-2292
> URL: https://issues.apache.org/jira/browse/YARN-2292
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> HADOOP-10771 refactors the WebHDFS authentication code to hadoop-common. 
> YARN-2290 will add support for passing delegation tokens via headers. Once 
> support is added RM web services should use the authentication code from 
> hadoop-common



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2100) Refactor the Timeline Server code for Kerberos + DT authentication


 [ 
https://issues.apache.org/jira/browse/YARN-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2100:
--

Target Version/s: 2.6.0

> Refactor the Timeline Server code for Kerberos + DT authentication
> --
>
> Key: YARN-2100
> URL: https://issues.apache.org/jira/browse/YARN-2100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> The customized Kerberos + DT authentication of the timeline server largely 
> refers to that of Http FS, therefore, there're a portion of duplicate code. 
> We should think about refactor the code if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2292) RM web services should use hadoop-common for authentication using delegation tokens


 [ 
https://issues.apache.org/jira/browse/YARN-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2292:
--

Target Version/s: 2.6.0

> RM web services should use hadoop-common for authentication using delegation 
> tokens
> ---
>
> Key: YARN-2292
> URL: https://issues.apache.org/jira/browse/YARN-2292
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> HADOOP-10771 refactors the WebHDFS authentication code to hadoop-common. 
> YARN-2290 will add support for passing delegation tokens via headers. Once 
> support is added RM web services should use the authentication code from 
> hadoop-common



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062382#comment-14062382
 ] 

Hadoop QA commented on YARN-2284:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655640/YARN2284-01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1262 javac 
compiler warnings (more than the trunk's current 1258 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/4307//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.ipc.TestIPC
  org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem
  org.apache.hadoop.fs.TestSymlinkLocalFSFileContext
  org.apache.hadoop.yarn.util.TestFSDownload

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4307//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4307//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4307//console

This message is automatically generated.

> Find missing config options in YarnConfiguration and yarn-default.xml
> -
>
> Key: YARN-2284
> URL: https://issues.apache.org/jira/browse/YARN-2284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: supportability
> Attachments: YARN2284-01.patch
>
>
> YarnConfiguration has one set of properties.  yarn-default.xml has another 
> set of properties.  Ideally, there should be an automatic way to find missing 
> properties in either location.
> This is analogous to MAPREDUCE-5130, but for yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062350#comment-14062350
 ] 

Hadoop QA commented on YARN-2233:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655791/apache-yarn-2233.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-auth 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4306//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4306//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4306//console

This message is automatically generated.

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2291) Timeline and RM web services should use same authentication code


[ 
https://issues.apache.org/jira/browse/YARN-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062392#comment-14062392
 ] 

Vinod Kumar Vavilapalli commented on YARN-2291:
---

This is likely a dup of the combination of YARN-2100 & YARN-2292. Keeping it 
open for now, we can close as later as is needed.

> Timeline and RM web services should use same authentication code
> 
>
> Key: YARN-2291
> URL: https://issues.apache.org/jira/browse/YARN-2291
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> The TimelineServer and the RM web services have very similar requirements and 
> implementation for authentication via delegation tokens apart from the fact 
> that the RM web services requires delegation tokens to be passed as a header. 
> They should use the same code base instead of different implementations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062431#comment-14062431
 ] 

Sunil G commented on YARN-796:
--

Hi [~gp.leftnoteasy]
Great. This feature will be a big addition to YARN.

I have few thoughts on this.

1. In our use case scenarios, we are more likely to have OR and NOT. I feel 
combination of these labels need to be in a defined or restricted way. Result 
of some combinations (AND, OR and NOT) may come invalid, and some may need to 
be reduced. This complexity need not have to bring to RM to take a final 
decision. 
2. *Reservation*: If a node label has many nodes under it, then there is a 
chance of reservation. Valid candidates may come later, so solution can be look 
in to this aspect also. Node Label level reservations ?
3. Centralized Configuration: If a new node is added to cluster, may be it can 
be started by having a label configuration in its yarn-site.xml. This may be 
fine I feel. your thoughts?

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2219) AMs and NMs can get exceptions after recovery but before scheduler knowns apps and app-attempts


 [ 
https://issues.apache.org/jira/browse/YARN-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2219:
--

Attachment: YARN-2219.2.patch

> AMs and NMs can get exceptions after recovery but before scheduler knowns 
> apps and app-attempts
> ---
>
> Key: YARN-2219
> URL: https://issues.apache.org/jira/browse/YARN-2219
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ashwin Shankar
>Assignee: Jian He
> Attachments: YARN-2219.1.patch, YARN-2219.2.patch
>
>
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testAppReregisterOnRMWorkPreservingRestart[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
>   Time elapsed: 4.335 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:91)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:113)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:110)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.registerAppAttempt(MockAM.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testAppReregisterOnRMWorkPreservingRestart(TestWorkPreservingRMRestart.java:562)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2219) AMs and NMs can get exceptions after recovery but before scheduler knowns apps and app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062492#comment-14062492
 ] 

Jian He commented on YARN-2219:
---

Fixed the comments
bq. instead of the shouldNotifyAppAccepted nomenclature, we can say 
isAppRecovering and flip the logic 
Updated the naming for attempt also to be consistent.  

> AMs and NMs can get exceptions after recovery but before scheduler knowns 
> apps and app-attempts
> ---
>
> Key: YARN-2219
> URL: https://issues.apache.org/jira/browse/YARN-2219
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ashwin Shankar
>Assignee: Jian He
> Attachments: YARN-2219.1.patch, YARN-2219.2.patch
>
>
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testAppReregisterOnRMWorkPreservingRestart[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
>   Time elapsed: 4.335 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:91)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:113)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:110)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.registerAppAttempt(MockAM.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testAppReregisterOnRMWorkPreservingRestart(TestWorkPreservingRMRestart.java:562)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062509#comment-14062509
 ] 

Hadoop QA commented on YARN-2233:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655816/apache-yarn-2233.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-auth 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4308//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4308//console

This message is automatically generated.

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch, 
> apache-yarn-2233.5.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs

Sunil G created YARN-2293:
-

 Summary: Scoring for NMs to identify a better candidate to launch 
AMs
 Key: YARN-2293
 URL: https://issues.apache.org/jira/browse/YARN-2293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Sunil G
Assignee: Sunil G


Container exit status from NM is giving indications of reasons for its failure. 
Some times, it may be because of container launching problems in NM. In a 
heterogeneous cluster, some machines with weak hardware may cause more 
failures. It will be better not to launch AMs there more often. Also I would 
like to clear that container failures because of buggy job should not result in 
decreasing score. 
As mentioned earlier, based on exit status if a scoring mechanism is added for 
NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts?




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs


[ 
https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062530#comment-14062530
 ] 

Jason Lowe commented on YARN-2293:
--

This sounds very similar to YARN-2005, if a bit more general.  This approach 
sounds like it could support a "gray" area for NMs where it really doesn't like 
to launch AMs on a node but may choose to do so anyway if that's the only place 
it can find.  It may be more fruitful to continue this discussion over on 
YARN-2005 and hash through how exit status would map to scoring adjustments, 
how the score would affect scheduling, and work through various corner cases.

> Scoring for NMs to identify a better candidate to launch AMs
> 
>
> Key: YARN-2293
> URL: https://issues.apache.org/jira/browse/YARN-2293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>
> Container exit status from NM is giving indications of reasons for its 
> failure. Some times, it may be because of container launching problems in NM. 
> In a heterogeneous cluster, some machines with weak hardware may cause more 
> failures. It will be better not to launch AMs there more often. Also I would 
> like to clear that container failures because of buggy job should not result 
> in decreasing score. 
> As mentioned earlier, based on exit status if a scoring mechanism is added 
> for NMs in RM, then NMs with better scores can be given for launching AMs. 
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2219) AMs and NMs can get exceptions after recovery but before scheduler knowns apps and app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062547#comment-14062547
 ] 

Hadoop QA commented on YARN-2219:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655840/YARN-2219.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4309//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4309//console

This message is automatically generated.

> AMs and NMs can get exceptions after recovery but before scheduler knowns 
> apps and app-attempts
> ---
>
> Key: YARN-2219
> URL: https://issues.apache.org/jira/browse/YARN-2219
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Ashwin Shankar
>Assignee: Jian He
> Attachments: YARN-2219.1.patch, YARN-2219.2.patch
>
>
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
> testAppReregisterOnRMWorkPreservingRestart[0](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
>   Time elapsed: 4.335 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:91)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:113)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM$1.run(MockAM.java:110)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.registerAppAttempt(MockAM.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testAppReregisterOnRMWorkPreservingRestart(TestWorkPreservingRMRestart.java:562)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2264) Race in DrainDispatcher can cause random test failures


[ 
https://issues.apache.org/jira/browse/YARN-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062551#comment-14062551
 ] 

Jian He commented on YARN-2264:
---

patch looks good. 

> Race in DrainDispatcher can cause random test failures
> --
>
> Key: YARN-2264
> URL: https://issues.apache.org/jira/browse/YARN-2264
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Li Lu
> Attachments: YARN-2264-070814.patch
>
>
> This is what can happen.
> This is the potential race.
> DrainDispatcher is started via serviceStart() . As a last step, this starts 
> the actual dispatcher thread (eventHandlingThread.start() - and returns 
> immediately - which means the thread may or may not have started up by the 
> time start returns.
> Event sequence: 
> UserThread: calls dispatcher.getEventHandler().handle()
> This sets drained = false, and a context switch happens.
> DispatcherThread: starts running
> DispatcherThread drained = queue.isEmpty(); -> This sets drained to true, 
> since Thread1 yielded before putting anything into the queue.
> UserThread: actual.handle(event) - which puts the event in the queue for the 
> dispatcher thread to process, and returns control.
> UserThread: dispatcher.await() - Since drained is true, this returns 
> immediately - even though there is a pending event to process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens


[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062573#comment-14062573
 ] 

Jian He commented on YARN-2211:
---

some comments:
- setCurrnetMasterKeyData, setNextMasterKeyData methods not used 
- change not needed?
{code}
 |- AMRMTOKEN_SECRET_MANAGER_ROOT_ZNODE_NAME
{code}
- Fix “System.out.println(stateData.getCurrentTokenMasterKey());” in 
FileSystemRMStateStore 
- Test: add test in restart scenario that AM issued with rolled-over AMRMToken 
is still able to communicate with restarted RM. 
testAppAttemptTokensRestoredOnRMRestart may help writing the test.

> RMStateStore needs to save AMRMToken master key for recovery when RM 
> restart/failover happens 
> --
>
> Key: YARN-2211
> URL: https://issues.apache.org/jira/browse/YARN-2211
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch
>
>
> After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
> related Master Keys and use them to recover the AMRMToken when RM 
> restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062578#comment-14062578
 ] 

Hadoop QA commented on YARN-2069:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12654821/YARN-2069-trunk-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4311//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4311//console

This message is automatically generated.

> CS queue level preemption should respect user-limits
> 
>
> Key: YARN-2069
> URL: https://issues.apache.org/jira/browse/YARN-2069
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Mayank Bansal
> Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
> YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch
>
>
> This is different from (even if related to, and likely share code with) 
> YARN-2113.
> YARN-2113 focuses on making sure that even if queue has its guaranteed 
> capacity, it's individual users are treated in-line with their limits 
> irrespective of when they join in.
> This JIRA is about respecting user-limits while preempting containers to 
> balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-15 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.4.patch

> RMStateStore needs to save AMRMToken master key for recovery when RM 
> restart/failover happens 
> --
>
> Key: YARN-2211
> URL: https://issues.apache.org/jira/browse/YARN-2211
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
> YARN-2211.4.patch
>
>
> After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
> related Master Keys and use them to recover the AMRMToken when RM 
> restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-15 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062617#comment-14062617
 ] 

Xuan Gong commented on YARN-2211:
-

bq. setCurrnetMasterKeyData, setNextMasterKeyData methods not used

Removed

bq. change not needed?
 |- AMRMTOKEN_SECRET_MANAGER_ROOT_ZNODE_NAME

Removed

bq. Fix “System.out.println(stateData.getCurrentTokenMasterKey());” in 
FileSystemRMStateStore

Removed

bq. Test: add test in restart scenario that AM issued with rolled-over 
AMRMToken is still able to communicate with restarted RM. 
testAppAttemptTokensRestoredOnRMRestart may help writing the test.

Yes, will add this Testcase in next ticket YARN-2212

> RMStateStore needs to save AMRMToken master key for recovery when RM 
> restart/failover happens 
> --
>
> Key: YARN-2211
> URL: https://issues.apache.org/jira/browse/YARN-2211
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
> YARN-2211.4.patch
>
>
> After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
> related Master Keys and use them to recover the AMRMToken when RM 
> restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2285) Preemption can cause capacity scheduler to show 5,000% queue absolute used capacity.


[ 
https://issues.apache.org/jira/browse/YARN-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062693#comment-14062693
 ] 

Tassapol Athiapinya commented on YARN-2285:
---

After closer look, 5000% is valid number. It means 5000% of "guaranteed 
capacity" of queue A (about 50% of absolute used capacity). I am making changes 
to jira title accordingly. I will also make this improvement jira instead of a 
bug.

The point here becomes whether it is nice to "re-label" text in web UI to 
better reflect its meaning saying "% used next to queue is % of guaranteed 
queue capacity, not absolute used capacity".

> Preemption can cause capacity scheduler to show 5,000% queue absolute used 
> capacity.
> 
>
> Key: YARN-2285
> URL: https://issues.apache.org/jira/browse/YARN-2285
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: Turn on CS Preemption.
>Reporter: Tassapol Athiapinya
> Attachments: preemption_5000_percent.png
>
>
> I configure queue A, B to have 1%, 99% capacity respectively. There is no max 
> capacity for each queue. Set high user limit factor.
> Submit app 1 to queue A. AM container takes 50% of cluster memory. Task 
> containers take another 50%. Submit app 2 to queue B. Preempt task containers 
> of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 
> 5000% used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2285) Preemption can cause capacity scheduler to show 5,000% queue capacity.


 [ 
https://issues.apache.org/jira/browse/YARN-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya updated YARN-2285:
--

  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)
   Summary: Preemption can cause capacity scheduler to show 5,000% queue 
capacity.  (was: Preemption can cause capacity scheduler to show 5,000% queue 
absolute used capacity.)

> Preemption can cause capacity scheduler to show 5,000% queue capacity.
> --
>
> Key: YARN-2285
> URL: https://issues.apache.org/jira/browse/YARN-2285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: Turn on CS Preemption.
>Reporter: Tassapol Athiapinya
>Priority: Minor
> Attachments: preemption_5000_percent.png
>
>
> I configure queue A, B to have 1%, 99% capacity respectively. There is no max 
> capacity for each queue. Set high user limit factor.
> Submit app 1 to queue A. AM container takes 50% of cluster memory. Task 
> containers take another 50%. Submit app 2 to queue B. Preempt task containers 
> of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 
> 5000% used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-15 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062698#comment-14062698
 ] 

Mayank Bansal commented on YARN-1408:
-

+1 Committing

Thanks [~sunilg] for the patch.
Thanks [~jianhe], [~vinodkv] and [~wangda] for the reviews.

Thanks,
Mayank

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
> Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
> Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062696#comment-14062696
 ] 

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655718/Yarn-1408.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4312//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4312//console

This message is automatically generated.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
> Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
> Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2285) Preemption can cause capacity scheduler to show 5,000% queue capacity.


[ 
https://issues.apache.org/jira/browse/YARN-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062697#comment-14062697
 ] 

Tassapol Athiapinya commented on YARN-2285:
---

Also it is not major but percentage shown is not right. In attached screenshot, 
root queue used is 146.5%.

> Preemption can cause capacity scheduler to show 5,000% queue capacity.
> --
>
> Key: YARN-2285
> URL: https://issues.apache.org/jira/browse/YARN-2285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: Turn on CS Preemption.
>Reporter: Tassapol Athiapinya
>Priority: Minor
> Attachments: preemption_5000_percent.png
>
>
> I configure queue A, B to have 1%, 99% capacity respectively. There is no max 
> capacity for each queue. Set high user limit factor.
> Submit app 1 to queue A. AM container takes 50% of cluster memory. Task 
> containers take another 50%. Submit app 2 to queue B. Preempt task containers 
> of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 
> 5000% used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062716#comment-14062716
 ] 

Hudson commented on YARN-1408:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5887 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5887/])
YARN-1408 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused 
a task timeout for 30mins. (Sunil G via mayank) (mayank: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610860)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
> Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
> Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcon

[jira] [Updated] (YARN-1336) Work-preserving nodemanager restart


 [ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1336:
-

Attachment: NMRestartDesignOverview.pdf

Attaching a PDF that briefly describes the approach and how the methods of the 
state store interface are used to persist and recover state.

> Work-preserving nodemanager restart
> ---
>
> Key: YARN-1336
> URL: https://issues.apache.org/jira/browse/YARN-1336
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup.patch
>
>
> This serves as an umbrella ticket for tasks related to work-preserving 
> nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.


[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062727#comment-14062727
 ] 

Craig Welch commented on YARN-1680:
---

It looks like this won't account for nodes which are blacklisted based on their 
rack, I think this is an uncovered case.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2294) Update sample program and documentations for writing YARN Application

Li Lu created YARN-2294:
---

 Summary: Update sample program and documentations for writing YARN 
Application
 Key: YARN-2294
 URL: https://issues.apache.org/jira/browse/YARN-2294
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Li Lu


Many APIs for writing YARN applications have been stabilized. However, some of 
them have also been changed since the last time sample YARN program, like 
distributed shell, and documentations were updated. There are on-going 
discussions in the user's mailing list about updating the outdated "Writing 
YARN Applications" documentation. Updating the sample programs like distributed 
shells is also needed, since they may probably be the very first demonstration 
of YARN applications for newcomers. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2295) Updating Client of YARN distributed shell with existing public stable API

Li Lu created YARN-2295:
---

 Summary: Updating Client of YARN distributed shell with existing 
public stable API
 Key: YARN-2295
 URL: https://issues.apache.org/jira/browse/YARN-2295
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


Some API calls in YARN distributed shell client have been marked as unstable 
and private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Update Client of YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Summary: Update Client of YARN distributed shell with existing public 
stable API  (was: Updating Client of YARN distributed shell with existing 
public stable API)

> Update Client of YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell client have been marked as unstable 
> and private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Updating Client of YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Attachment: YARN-2295-071514.patch

Replacing the unstable privately visible Records.newRecord method with the 
newInstance method for each class.

> Updating Client of YARN distributed shell with existing public stable API
> -
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell client have been marked as unstable 
> and private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2296) Update Application Master of YARN distributed shell with existing public stable API

Li Lu created YARN-2296:
---

 Summary: Update Application Master of YARN distributed shell with 
existing public stable API
 Key: YARN-2296
 URL: https://issues.apache.org/jira/browse/YARN-2296
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Update Client of YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Attachment: YARN-2295-071514-1.patch

Updated patch with refactoring in both AM and Client

> Update Client of YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell client have been marked as unstable 
> and private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Refactor YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Description: Some API calls in YARN distributed shell have been marked as 
unstable and private. Use existing public stable API to replace them, if 
possible.   (was: Some API calls in YARN distributed shell client have been 
marked as unstable and private. Use existing public stable API to replace them, 
if possible. )
Summary: Refactor YARN distributed shell with existing public stable 
API  (was: Update Client of YARN distributed shell with existing public stable 
API)

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2296) Update Application Master of YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu resolved YARN-2296.
-

Resolution: Duplicate

Merged into YARN-2295

> Update Application Master of YARN distributed shell with existing public 
> stable API
> ---
>
> Key: YARN-2296
> URL: https://issues.apache.org/jira/browse/YARN-2296
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2295) Refactor YARN distributed shell with existing public stable API


[ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062799#comment-14062799
 ] 

Hadoop QA commented on YARN-2295:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655897/YARN-2295-071514.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4313//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4313//console

This message is automatically generated.

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062808#comment-14062808
 ] 

Vinod Kumar Vavilapalli commented on YARN-2233:
---

Looks good, +1. Checking this in..

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch, 
> apache-yarn-2233.5.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1695) Implement the rest (writable APIs) of RM web-services


 [ 
https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1695:
--

Priority: Major  (was: Blocker)

> Implement the rest (writable APIs) of RM web-services
> -
>
> Key: YARN-1695
> URL: https://issues.apache.org/jira/browse/YARN-1695
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Varun Vasudev
>
> MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs 
> added there were only focused on obtaining information from the cluster. We 
> need to have the following REST APIs to finish the feature
>  - Application submission/termination (Priority): This unblocks easy client 
> interaction with a YARN cluster
>  - Application Client protocol: For resource scheduling by apps written in an 
> arbitrary language. Will have to think about throughput concerns
>  - ContainerManagement Protocol: Again for arbitrary language apps.
> One important thing to note here is that we already have client libraries on 
> all the three protocols that do some some heavy-lifting. One part of the 
> effort is to figure out if they can be made any thinner and/or how 
> web-services will implement the same functionality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2295) Refactor YARN distributed shell with existing public stable API


[ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062826#comment-14062826
 ] 

Hadoop QA commented on YARN-2295:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655903/YARN-2295-071514-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4314//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4314//console

This message is automatically generated.

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens


[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062831#comment-14062831
 ] 

Hudson commented on YARN-2233:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5888 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5888/])
YARN-2233. Implemented ResourceManager web-services to create, renew and cancel 
delegation tokens. Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610876)
* /hadoop/common/trunk/hadoop-common-project/hadoop-auth/pom.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/DelegationToken.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch, apache-yarn-2233.3.patch, apache-yarn-2233.4.patch, 
> apache-yarn-2233.5.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062861#comment-14062861
 ] 

Jason Lowe commented on YARN-1341:
--

Thanks for commenting, Devaraj!  My apologies for the late reply, as I was on 
vacation and am still catching up.

bq. In addition to option 1), I'd think of making the NM down if NM fails to 
store RM keys for certain number of times(configurable) consecutively.

As for retries, I mentioned earlier that if retries are likely to help then the 
state store implementation should do so rather than have the common code do so. 
 For the leveldb implementation it is very unlikely that a retry is going to do 
anything other than just make the operation take longer to ultimately fail.  
The the firmware of the drive is already going to implement a large number of 
retries to attempt to recover from hardware errors, and non-hardware local 
filesystem errors are highly unlikely to be fixed by simply retrying 
immediately.  If that were the case then I'd expect retries to be implemented 
in many other places where the local filesystem is used by Hadoop code.

bq. And also we can make it(i.e. tear down NM or not) as configurable

I'd like to avoid adding yet more config options unless we think we really need 
them, but if people agree this needs to be configurable then we can do so.  
Also I assume in that scenario you would want the NM to shutdown while also 
tearing down containers, cleaning up, etc. as if it didn't support recovery.  
Tearing down the NM on a state store error just to have it start up again and 
try to recover with stale state seems pointless -- might as well have just kept 
running which is a better outcome.  Or am I missing a use case for that?


And thanks, Junping, for the recent comments!

bq. If you are also agree on this, we can separate this document effort to 
other JIRA (Umbrella or a dedicated one, whatever you like) and continue the 
discussion on this particular case.

Sure, we can discuss general error handling or an overall document for it 
either on YARN-1336 or a new JIRA.

bq. a. if currentMasterKey is stale, it can be updated and override soon with 
registering to RM later. Nothing is affected.
Correct, the NM should receive the current master key upon re-registration with 
the RM after it restarts.

bq. b. if previousMasterKey is stale, then the real previous master key is 
lost, so the affection is: AMs with real master key cannot connect to NM to 
launch containers.
AMs that have the current master key will still be able to connect because the 
NM just got the current master key as described in a).  AM's that have the 
previous master key will not be able to connect to the NM unless that 
particular master key also happened to be successfully associated with the 
attempt in the state store (related to case c).

bq. c. if applicationMasterKeys are stale, then previous old keys get tracked 
in applicationMasterKeys get lost after restart. The affection is: AMs with old 
keys cannot connect to NM to launch containers.
AMs that use an old key (i.e.: not the current or previous master key) would be 
unable to connect to the NM.

bq. Anything I am missing here?
I don't believe so.  The bottom line is that an AM may not be able to 
successfully connect to an NM after a restart with stale NM token state.

> Recover NMTokens upon nodemanager restart
> -
>
> Key: YARN-1341
> URL: https://issues.apache.org/jira/browse/YARN-1341
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
> YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-15 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062879#comment-14062879
 ] 

Mayank Bansal commented on YARN-1408:
-

Committed to trunk, branch 2 and branch-2.5.

branch-2.5 needed some rebase , Updating the patch for branch-2.5

Thanks,
Mayank

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-1408-branch-2.5-1.patch, Yarn-1408.1.patch, 
> Yarn-1408.10.patch, Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
> Yarn-1408.8.patch, Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1408:


Attachment: YARN-1408-branch-2.5-1.patch

rebasing against branch 2.5


> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-1408-branch-2.5-1.patch, Yarn-1408.1.patch, 
> Yarn-1408.10.patch, Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
> Yarn-1408.8.patch, Yarn-1408.9.patch, Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2285) Preemption can cause capacity scheduler to show 5,000% queue capacity.

2014-07-15 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2285:


Assignee: Wangda Tan

Assigned it to me, working on this ...

> Preemption can cause capacity scheduler to show 5,000% queue capacity.
> --
>
> Key: YARN-2285
> URL: https://issues.apache.org/jira/browse/YARN-2285
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: Turn on CS Preemption.
>Reporter: Tassapol Athiapinya
>Assignee: Wangda Tan
>Priority: Minor
> Attachments: preemption_5000_percent.png
>
>
> I configure queue A, B to have 1%, 99% capacity respectively. There is no max 
> capacity for each queue. Set high user limit factor.
> Submit app 1 to queue A. AM container takes 50% of cluster memory. Task 
> containers take another 50%. Submit app 2 to queue B. Preempt task containers 
> of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 
> 5000% used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Refactor YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Attachment: YARN-2295-071514-1.patch

Could not reappear the UT failure locally. Resubmitting this patch to see if 
the problem reappears. 

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514-1.patch, 
> YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.

Tassapol Athiapinya created YARN-2297:
-

 Summary: Preemption can hang in corner case by not allowing any 
task container to proceed.
 Key: YARN-2297
 URL: https://issues.apache.org/jira/browse/YARN-2297
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Priority: Critical


Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
container can run.

h3. queue configuration
Queue A/B has 1% and 99% respectively. 
No max capacity.

h3. scenario
Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 1 
user.
Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. Occupy 
entire cluster.
Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. No 
task of either app can proceed. 

h3. commands
/usr/lib/hadoop/bin/hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
"-Dmapreduce.map.memory.mb=2000" 
"-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
"-Dmapreduce.randomtextwriter.bytespermap=2147483648" 
"-Dmapreduce.job.queuename=A" "-Dmapreduce.map.maxattempts=100" 
"-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
"-Dmapreduce.map.java.opts=-Xmx1800M" 
"-Dmapreduce.randomtextwriter.mapsperhost=1" 
"-Dmapreduce.randomtextwriter.totalbytes=2147483648" dir1

/usr/lib/hadoop/bin/hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
"-Dmapreduce.map.memory.mb=2000" 
"-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" "-Dmapreduce.job.queuename=B" 
"-Dmapreduce.map.maxattempts=100" "-Dmapreduce.am.max-attempts=1" 
"-Dyarn.app.mapreduce.am.resource.mb=2000" 
"-Dmapreduce.map.java.opts=-Xmx1800M" -m 1 -r 0 -mt 4000  -rt 0




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062911#comment-14062911
 ] 

Hadoop QA commented on YARN-2069:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12654821/YARN-2069-trunk-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4315//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4315//console

This message is automatically generated.

> CS queue level preemption should respect user-limits
> 
>
> Key: YARN-2069
> URL: https://issues.apache.org/jira/browse/YARN-2069
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Mayank Bansal
> Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
> YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch
>
>
> This is different from (even if related to, and likely share code with) 
> YARN-2113.
> YARN-2113 focuses on making sure that even if queue has its guaranteed 
> capacity, it's individual users are treated in-line with their limits 
> irrespective of when they join in.
> This JIRA is about respecting user-limits while preempting containers to 
> balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Refactor YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Attachment: (was: YARN-2295-071514-1.patch)

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.

2014-07-15 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2297:


Assignee: Wangda Tan

> Preemption can hang in corner case by not allowing any task container to 
> proceed.
> -
>
> Key: YARN-2297
> URL: https://issues.apache.org/jira/browse/YARN-2297
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.0
>Reporter: Tassapol Athiapinya
>Assignee: Wangda Tan
>Priority: Critical
>
> Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
> container can run.
> h3. queue configuration
> Queue A/B has 1% and 99% respectively. 
> No max capacity.
> h3. scenario
> Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
> 1 user.
> Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
> Occupy entire cluster.
> Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
> Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
> No task of either app can proceed. 
> h3. commands
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
> "-Dmapreduce.map.memory.mb=2000" 
> "-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
> "-Dmapreduce.randomtextwriter.bytespermap=2147483648" 
> "-Dmapreduce.job.queuename=A" "-Dmapreduce.map.maxattempts=100" 
> "-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
> "-Dmapreduce.map.java.opts=-Xmx1800M" 
> "-Dmapreduce.randomtextwriter.mapsperhost=1" 
> "-Dmapreduce.randomtextwriter.totalbytes=2147483648" dir1
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
> "-Dmapreduce.map.memory.mb=2000" 
> "-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
> "-Dmapreduce.job.queuename=B" "-Dmapreduce.map.maxattempts=100" 
> "-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
> "-Dmapreduce.map.java.opts=-Xmx1800M" -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2295) Refactor YARN distributed shell with existing public stable API


[ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062936#comment-14062936
 ] 

Hadoop QA commented on YARN-2295:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655934/YARN-2295-071514-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4316//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4316//console

This message is automatically generated.

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2295-071514-1.patch, YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2295) Refactor YARN distributed shell with existing public stable API


 [ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2295:


Attachment: TEST-YARN-2295-071514.patch

Probably a deterministic failure on server. Use a trivial formatting patch with 
no trailing tabs to see if it's the problem with the server. 

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: TEST-YARN-2295-071514.patch, YARN-2295-071514-1.patch, 
> YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2295) Refactor YARN distributed shell with existing public stable API


[ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062964#comment-14062964
 ] 

Hadoop QA commented on YARN-2295:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655957/TEST-YARN-2295-071514.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4317//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4317//console

This message is automatically generated.

> Refactor YARN distributed shell with existing public stable API
> ---
>
> Key: YARN-2295
> URL: https://issues.apache.org/jira/browse/YARN-2295
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: TEST-YARN-2295-071514.patch, YARN-2295-071514-1.patch, 
> YARN-2295-071514.patch
>
>
> Some API calls in YARN distributed shell have been marked as unstable and 
> private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected


[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062965#comment-14062965
 ] 

Craig Welch commented on YARN-1198:
---

It seems like the related problem with these group of jiras is mostly around 
when the cluster is resource constrained/has a small number of large jobs using 
most of the resources it can get into deadlock scenarios.  In addition to fixes 
for the specific behaviors I think it would be worthwhile to do a min of the 
calculated headroom against "cluster headroom" as a sanity check, cluster 
headroom being the total cluster resource - utilized resources.  I've attached 
a partial patch for that.  This will not help with the application blacklist 
case (1680) but it would help with 1857 and 2008 (it doesn't correct the 
mistake in headroom calculation, but it should prevent it from causing a 
deadlock).  (That's not to say we should not also fix the individual issues, 
just that this might be a good "catch all" for others we aren't aware of / the 
problem generally).  I'm attaching an initial pass at doing this (it's just the 
basics to see if the direction makes sense, not a finished product). 

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected


 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--

Attachment: YARN-1198.1.patch

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1198.1.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.


[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062972#comment-14062972
 ] 

Craig Welch commented on YARN-1680:
---

I was also wondering if we could maintain a Resource representing the amount of 
resources blacklisted by the application which was updated as nodes/racks were 
blacklisted and removed from the application blacklist instead of iterating the 
nodes looking for the amount of blacklisted resources at the time of headroom 
calculation.  This "blacklisted" resource would be subtracted from the cluster 
resource (similar to how it works in the current patch in that respect) to make 
sure the headroom calculation is correct.  It seems like this might be a good 
approach as it should be "close to free" to update that blacklist resource when 
adding and removing things form the blacklist, and I think blacklisting may be 
less frequent than headroom calculation.  Thoughts?

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2045) Data persisted in NM should be versioned