[jira] [Commented] (YARN-2071) Enforce more restricted permissions for the directory of Leveldb store

2014-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008407#comment-14008407
 ] 

Hudson commented on YARN-2071:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5609/])
YARN-2071. Modified levelDB store permissions to be readable only by the server 
user. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597231)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java


 Enforce more restricted permissions for the directory of Leveldb store
 --

 Key: YARN-2071
 URL: https://issues.apache.org/jira/browse/YARN-2071
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-2071.1.patch, YARN-2071.2.patch


 We need to enforce more restricted permissions for the directory of Leveldb 
 store, as w did for filesystem generic history store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2059) Extend access control for admin acls

2014-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008413#comment-14008413
 ] 

Hudson commented on YARN-2059:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5609/])
YARN-2059. Added admin ACLs support to Timeline Server. Contributed by Zhijie 
Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597207)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TestTimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 Extend access control for admin acls
 

 Key: YARN-2059
 URL: https://issues.apache.org/jira/browse/YARN-2059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-2059.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart

2014-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008410#comment-14008410
 ] 

Hudson commented on YARN-2096:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5609/])
YARN-2096. Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597223)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Race in TestRMRestart#testQueueMetricsOnRMRestart
 -

 Key: YARN-2096
 URL: https://issues.apache.org/jira/browse/YARN-2096
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.5.0

 Attachments: YARN-2096.patch


 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart
  fails randomly because of a race condition.
 The test validates that metrics are incremented, but does not wait for all 
 transitions to finish before checking for the values.
 It also resets metrics after kicking off recovery of second RM. The metrics 
 that need to be incremented race with this reset causing test to fail 
 randomly.
 We need to wait for the right transitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008409#comment-14008409
 ] 

Hudson commented on YARN-1937:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5609/])
YARN-1937. Added owner-only ACLs support for Timeline Client and server. 
Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597186)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/security/TestTimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.5.0

 Attachments: YARN-1937.1.patch, YARN-1937.2.patch, YARN-1937.3.patch, 
 YARN-1937.4.patch, YARN-1937.5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2012) Fair Scheduler: allow default queue placement rule to take an arbitrary queue

2014-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008412#comment-14008412
 ] 

Hudson commented on YARN-2012:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5609/])
YARN-2012. Fair Scheduler: allow default queue placement rule to take an 
arbitrary queue (Ashwin Shankar via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1597204)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 Fair Scheduler: allow default queue placement rule to take an arbitrary queue
 -

 Key: YARN-2012
 URL: https://issues.apache.org/jira/browse/YARN-2012
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.5.0

 Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt, YARN-2012-v3.txt


 Currently 'default' rule in queue placement policy,if applied,puts the app in 
 root.default queue. It would be great if we can make 'default' rule 
 optionally point to a different queue as default queue .
 This default queue can be a leaf queue or it can also be an parent queue if 
 the 'default' rule is nested inside nestedUserQueue rule(YARN-1864).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008421#comment-14008421
 ] 

Hadoop QA commented on YARN-1368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646657/YARN-1368.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3826//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3826//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008538#comment-14008538
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe],
Thanks for addressing my comments, two more comments,
1. 
bq. it's a repeated field
I think it's better to follow YARN's convention, repeated field uses plural 
formal of word, you can checkout yarn_server_common_service_protos.proto for 
examples,

2. It's better to add test for RegisterNodeManagerPBImpl as well,

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

2014-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008546#comment-14008546
 ] 

Hadoop QA commented on YARN-149:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12595828/YARN%20ResourceManager%20Automatic%20Failover-rev-08-04-13.pdf
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3829//console

This message is automatically generated.

 ResourceManager (RM) High-Availability (HA)
 ---

 Key: YARN-149
 URL: https://issues.apache.org/jira/browse/YARN-149
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Harsh J
  Labels: patch
 Attachments: YARN ResourceManager Automatic 
 Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic 
 Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, 
 rm-ha-phase1-draft2.pdf


 This jira tracks work needed to be done to support one RM instance failing 
 over to another RM instance so that we can have RM HA. Work includes leader 
 election, transfer of control to leader and client re-direction to new leader.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-25 Thread Yi Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Tian updated YARN-2083:
--

Attachment: YARN-2083.patch

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-25 Thread Yi Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Tian updated YARN-2083:
--

Attachment: (was: YARN-2083.patch)

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1000) Dynamic resource configuration feature can be configured to enable or disable and persistent on setting or not

2014-05-25 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1000:
--

Attachment: YARN-1000-sample.patch

Attached file is sample implementation to switch dynamic resource configuration 
on RM.
This patch introduces yarn.dynamic-resource-configuration.enable and 
yarn.dynamic-resource-configuration-persist.enable parameters.
Please refer to it if you have interest. I'm okay that leave it until proper 
timing. Thanks.


 Dynamic resource configuration feature can be configured to enable or disable 
 and persistent on setting or not
 --

 Key: YARN-1000
 URL: https://issues.apache.org/jira/browse/YARN-1000
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1000-sample.patch


 There are some configurations for feature of dynamic resource configuration:
 1. enable or not: if enable, then setting node resource in runtime through 
 CLI/REST/JMX can be successful, else exceptions of function not supported 
 will be thrown out. In future, we may support to enable this feature in 
 partial nodes which has resource flexibility (like virtual nodes).
 2. dynamic resource setting is persistent or not: it depends on users' 
 scenario to see if the life cycle of setting in runtime should be kept after 
 NM is down and restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-25 Thread Yi Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008571#comment-14008571
 ] 

Yi Tian commented on YARN-2083:
---

I think precheck is better than aftercheck. And I think we could not do 
more check because we don't know the schedule application' information in 
assignContainerPreCheck function 

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008576#comment-14008576
 ] 

Sunil G commented on YARN-1408:
---

Yes. As I mentioned in earlier comment, it is better to add back/recreate the 
Resource Request back in CS.
Also it will be better to do from CapacityScheduler#killContainer, as it is 
invoked directly in the preemption context.

[~jianhe], [~mayank_bansal] please share your thoughts.


 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008587#comment-14008587
 ] 

Hadoop QA commented on YARN-2083:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646736/YARN-2083.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3830//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3830//console

This message is automatically generated.

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.3.0

 Attachments: YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)