[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628727#comment-13628727
 ] 

Hadoop QA commented on YARN-486:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578139/YARN-486-20130410.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 26 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/714//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/714//console

This message is automatically generated.

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
 YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
 YARN-486.6.patch


 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-564) Job history logs do not log anything when JVM fails to start

2013-04-11 Thread Gopal V (JIRA)
Gopal V created YARN-564:


 Summary: Job history logs do not log anything when JVM fails to 
start
 Key: YARN-564
 URL: https://issues.apache.org/jira/browse/YARN-564
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: Ubuntu 64bit 
Reporter: Gopal V
Priority: Minor
 Attachments: yarn-error.png

If the -Xmx line in a java invocation has errors such as is possible with 
quoting issues as would be possible with hive

hive set mapred.map.child.java.opts=-server -Xmx2248m 
-Djava.net.preferIPv4Stack=true;

The diagnostic error message says

{code}
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
{code}

The job history UI is rather unhelpful and the UI says 

{code}
Attmpt state missing from History : marked as KILLED
{code}

And the log files are not available and instead show

{code}
Logs not available for attempt_1365673149565_0002_m_00_3. Aggregation may 
not be complete, Check back later or try the nodemanager at :-1
{code}





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-565) AM UI does not show splits/locality info

2013-04-11 Thread Gopal V (JIRA)
Gopal V created YARN-565:


 Summary: AM UI does not show splits/locality info 
 Key: YARN-565
 URL: https://issues.apache.org/jira/browse/YARN-565
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gopal V
Priority: Minor


The AM UI currently shows the tasks without indicating the locality or 
speculation of a task.

This information is available by reading it out of the logs later, but while 
tracking a slow/straggler task, this is invaluable in finding separating the 
locality misses from other data-sensitive slow-downs or skews.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-566) Add a misconfiguration diagnostic or highlight mismatched resource.mb vs -Xmx

2013-04-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated YARN-566:
-

Labels: usability  (was: )

 Add a misconfiguration diagnostic or highlight mismatched resource.mb vs -Xmx
 -

 Key: YARN-566
 URL: https://issues.apache.org/jira/browse/YARN-566
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gopal V
Priority: Minor
  Labels: usability

 It is possible to misconfigure a pure-java MR application by setting 
 mapreduce resource size to 1536Mb, while leaving the -Xmx value at the 
 default 512Mb.
 The AM UI can track over-all memory usage of an app either via 
 Runtime.maxMemory() or via the process watcher in YARN to report the actual 
 memory used by the application.
 MR does not fail even when the memory -Xmx is set lower than the YARN  
 allocations, but it spills  GCs instead to keep it with in the memory sizes.
 This is sub-optimal and would make sense for YARN to highlight in the AM UI 
 or JHS UI for tuning/debugging purposes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628826#comment-13628826
 ] 

Hudson commented on YARN-539:
-

Integrated in Hadoop-Yarn-trunk #180 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/180/])
YARN-539. Addressed memory leak of LocalResource objects NM when a resource 
localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466756
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


 LocalizedResources are leaked in memory in case resource localization fails
 ---

 Key: YARN-539
 URL: https://issues.apache.org/jira/browse/YARN-539
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, 
 yarn-539-20130410.patch


 If resource localization fails then resource remains in memory and is
 1) Either cleaned up when next time cache cleanup runs and there is space 
 crunch. (If sufficient space in cache is available then it will remain in 
 memory).
 2) reused if LocalizationRequest comes again for the same resource.
 I think when resource localization fails then that event should be sent to 
 LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628830#comment-13628830
 ] 

Hudson commented on YARN-487:
-

Integrated in Hadoop-Yarn-trunk #180 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/180/])
YARN-487. Modify path manipulation in LocalDirsHandlerService to let 
TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 
1466746)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466746
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java


 TestDiskFailures fails on Windows due to path mishandling
 -

 Key: YARN-487
 URL: https://issues.apache.org/jira/browse/YARN-487
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-487.1.patch


 {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
 extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
 on Windows.  The test assertions also fail to account for the fact that 
 {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-branch-2-b.patch

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, 
 YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, 
 YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: (was: YARN-427-branch-2-b.patch)

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
 YARN-427-trunk-a.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: (was: YARN-427-trunk-b.patch)

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
 YARN-427-trunk-a.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628850#comment-13628850
 ] 

Hadoop QA commented on YARN-427:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578196/YARN-427-trunk-b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/715//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/715//console

This message is automatically generated.

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
 YARN-427-trunk-a.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-branch-2-b.patch
YARN-427-branch-0.23-b.patch

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-trunk-b.patch

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Aleksey Gorshkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628860#comment-13628860
 ] 

Aleksey Gorshkov commented on YARN-427:
---

Patches updated 
patch YARN-427-trunk-b.patch for trunk
patch YARN-427-branch-2-b.patch for branch-2 
patch YARN-427-branch-0.23-b.patch for branch-0.23

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628868#comment-13628868
 ] 

Hadoop QA commented on YARN-427:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578200/YARN-427-trunk-b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/716//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/716//console

This message is automatically generated.

 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628889#comment-13628889
 ] 

Alejandro Abdelnur commented on YARN-45:


Carlo, what about a small twist?

A preempt message (instead of request, as there is no preempt response) would 
contain:

* Resources (# CPUs  # Memory) : total amount of resources that may be 
preempted if no action is taken by the AM.
* SetContainerID : list of containers that would be killed by the RM to claim 
the resources if no action is taken by the AM.

Computing the resources is straight forward, just aggregating the resources of 
the SetContainerID.

An AM can take action using either or information.

If an AM releases the requested amount of resources, even if they don't match 
the received container IDs, then the AM will not be over threshold anymore, 
thus getting rid of the preemption pressure fully or partially. If the AM 
fullfils the preemption only partially, then the RM will still kill some 
containers from the set.

As the set is not ordered, still it is not known to the AM what containers will 
exactly be killed. So the set is just the list of containers in danger of being 
preempted.

I may be backtracking a bit on my previous comments, 'trading these containers 
for equivalent ones' seems acceptable and gives the scheduler some freedom on 
how to best take care of things if an AM is over limit. If an AM releases the 
requested amount of resources, regardless of what containers releases, the AM 
won't be preempted for this preemption message. We just need to clearly spell 
out the behavior.

With this approach I think we don't need #1 and #2?

Thoughts?



 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628893#comment-13628893
 ] 

Alejandro Abdelnur commented on YARN-45:


Forgot to add, unless I'm missing something location of the preemption is not 
important, just capacity, right?

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628898#comment-13628898
 ] 

Hudson commented on YARN-539:
-

Integrated in Hadoop-Hdfs-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/])
YARN-539. Addressed memory leak of LocalResource objects NM when a resource 
localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466756
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


 LocalizedResources are leaked in memory in case resource localization fails
 ---

 Key: YARN-539
 URL: https://issues.apache.org/jira/browse/YARN-539
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, 
 yarn-539-20130410.patch


 If resource localization fails then resource remains in memory and is
 1) Either cleaned up when next time cache cleanup runs and there is space 
 crunch. (If sufficient space in cache is available then it will remain in 
 memory).
 2) reused if LocalizationRequest comes again for the same resource.
 I think when resource localization fails then that event should be sent to 
 LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628897#comment-13628897
 ] 

Hudson commented on YARN-495:
-

Integrated in Hadoop-Hdfs-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/])
YARN-495. Changed NM reboot behaviour to be a simple resync - kill all 
containers  and re-register with RM. Contributed by Jian He. (Revision 1466752)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466752
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java


 Change NM behavior of reboot to resync
 --

 Key: YARN-495
 URL: https://issues.apache.org/jira/browse/YARN-495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, 
 YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch


 When a reboot command is sent from RM, the node manager doesn't clean up the 
 containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628902#comment-13628902
 ] 

Hudson commented on YARN-487:
-

Integrated in Hadoop-Hdfs-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1369/])
YARN-487. Modify path manipulation in LocalDirsHandlerService to let 
TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 
1466746)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466746
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java


 TestDiskFailures fails on Windows due to path mishandling
 -

 Key: YARN-487
 URL: https://issues.apache.org/jira/browse/YARN-487
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-487.1.patch


 {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
 extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
 on Windows.  The test assertions also fail to account for the fact that 
 {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-567:
-

 Summary: RM changes to support preemption for FairScheduler and 
CapacityScheduler
 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino



A common tradeoff in scheduling jobs is between keeping the cluster busy and 
enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
takes opposite stance on how to achieve this. 

The FairScheduler, leverages task-killing to quickly reclaim resources from 
currently running jobs and redistributing them among new jobs, thus keeping the 
cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood 
of violating capacity is low, thus never wasting work, but risking to keep the 
cluster underutilized or have jobs waiting to obtain their rightful capacity. 

By introducing the notion of a work-preserving preemption we can remove this 
tradeoff.  This requires a protocol for preemption (YARN-45), and 
ApplicationMasters that can answer to preemption  efficiently (e.g., by saving 
their intermediate state, this will be posted for MapReduce in a separate JIRA 
soon), together with a scheduler that can issues preemption requests (discussed 
in separate JIRAs).

The changes we track with this JIRA are common to FairScheduler and 
CapacityScheduler, and are mostly propagation of preemption decisions through 
the ApplicationMastersService.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628922#comment-13628922
 ] 

Carlo Curino commented on YARN-45:
--

Our main focus for now is to rebalance capacity, in this sense yes location is 
not important. 

However, one can envision the use of preemption also for other things, e.g., to 
build a monitor that 
tries to improve data-locality by issuing (a moderate amount of) relocations 
of a container (probably
riding the same checkpointing mechanics we are bulding for MR). 

This is another case where container-based preemption can turn out to be 
useful. (This is at the moment 
just a speculation).


 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-568:
-

 Summary: FairScheduler: support for work-preserving preemption 
 Key: YARN-568
 URL: https://issues.apache.org/jira/browse/YARN-568
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Carlo Curino


In the attached patch, we modified  the FairScheduler to substitute its 
preemption-by-killling with a work-preserving version of preemption (followed 
by killing if the AMs do not respond quickly enough). This should allows to run 
preemption checking more often, but kill less often (proper tuning to be 
investigated).  Depends on YARN-567 and YARN-45.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-567:
--

Description: 
A common tradeoff in scheduling jobs is between keeping the cluster busy and 
enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
takes opposite stance on how to achieve this. 

The FairScheduler, leverages task-killing to quickly reclaim resources from 
currently running jobs and redistributing them among new jobs, thus keeping the 
cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood 
of violating capacity is low, thus never wasting work, but risking to keep the 
cluster underutilized or have jobs waiting to obtain their rightful capacity. 

By introducing the notion of a work-preserving preemption we can remove this 
tradeoff.  This requires a protocol for preemption (YARN-45), and 
ApplicationMasters that can answer to preemption  efficiently (e.g., by saving 
their intermediate state, this will be posted for MapReduce in a separate JIRA 
soon), together with a scheduler that can issues preemption requests (discussed 
in separate JIRAs YARN-568 and YARN-569).

The changes we track with this JIRA are common to FairScheduler and 
CapacityScheduler, and are mostly propagation of preemption decisions through 
the ApplicationMastersService.


  was:

A common tradeoff in scheduling jobs is between keeping the cluster busy and 
enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
takes opposite stance on how to achieve this. 

The FairScheduler, leverages task-killing to quickly reclaim resources from 
currently running jobs and redistributing them among new jobs, thus keeping the 
cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood 
of violating capacity is low, thus never wasting work, but risking to keep the 
cluster underutilized or have jobs waiting to obtain their rightful capacity. 

By introducing the notion of a work-preserving preemption we can remove this 
tradeoff.  This requires a protocol for preemption (YARN-45), and 
ApplicationMasters that can answer to preemption  efficiently (e.g., by saving 
their intermediate state, this will be posted for MapReduce in a separate JIRA 
soon), together with a scheduler that can issues preemption requests (discussed 
in separate JIRAs).

The changes we track with this JIRA are common to FairScheduler and 
CapacityScheduler, and are mostly propagation of preemption decisions through 
the ApplicationMastersService.



 RM changes to support preemption for FairScheduler and CapacityScheduler
 

 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino

 A common tradeoff in scheduling jobs is between keeping the cluster busy and 
 enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
 takes opposite stance on how to achieve this. 
 The FairScheduler, leverages task-killing to quickly reclaim resources from 
 currently running jobs and redistributing them among new jobs, thus keeping 
 the cluster busy but waste useful work. The CapacityScheduler is typically 
 tuned
 to limit the portion of the cluster used by each queue so that the likelihood 
 of violating capacity is low, thus never wasting work, but risking to keep 
 the cluster underutilized or have jobs waiting to obtain their rightful 
 capacity. 
 By introducing the notion of a work-preserving preemption we can remove this 
 tradeoff.  This requires a protocol for preemption (YARN-45), and 
 ApplicationMasters that can answer to preemption  efficiently (e.g., by 
 saving their intermediate state, this will be posted for MapReduce in a 
 separate JIRA soon), together with a scheduler that can issues preemption 
 requests (discussed in separate JIRAs YARN-568 and YARN-569).
 The changes we track with this JIRA are common to FairScheduler and 
 CapacityScheduler, and are mostly propagation of preemption decisions through 
 the ApplicationMastersService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Description: 
In the attached patch, we modified  the FairScheduler to substitute its 
preemption-by-killling with a work-preserving version of preemption (followed 
by killing if the AMs do not respond quickly enough). This should allows to run 
preemption checking more often, but kill less often (proper tuning to be 
investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.


  was:
In the attached patch, we modified  the FairScheduler to substitute its 
preemption-by-killling with a work-preserving version of preemption (followed 
by killing if the AMs do not respond quickly enough). This should allows to run 
preemption checking more often, but kill less often (proper tuning to be 
investigated).  Depends on YARN-567 and YARN-45.



 FairScheduler: support for work-preserving preemption 
 --

 Key: YARN-568
 URL: https://issues.apache.org/jira/browse/YARN-568
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Carlo Curino

 In the attached patch, we modified  the FairScheduler to substitute its 
 preemption-by-killling with a work-preserving version of preemption (followed 
 by killing if the AMs do not respond quickly enough). This should allows to 
 run preemption checking more often, but kill less often (proper tuning to be 
 investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-567:
--

Attachment: common.patch

 RM changes to support preemption for FairScheduler and CapacityScheduler
 

 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: common.patch


 A common tradeoff in scheduling jobs is between keeping the cluster busy and 
 enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
 takes opposite stance on how to achieve this. 
 The FairScheduler, leverages task-killing to quickly reclaim resources from 
 currently running jobs and redistributing them among new jobs, thus keeping 
 the cluster busy but waste useful work. The CapacityScheduler is typically 
 tuned
 to limit the portion of the cluster used by each queue so that the likelihood 
 of violating capacity is low, thus never wasting work, but risking to keep 
 the cluster underutilized or have jobs waiting to obtain their rightful 
 capacity. 
 By introducing the notion of a work-preserving preemption we can remove this 
 tradeoff.  This requires a protocol for preemption (YARN-45), and 
 ApplicationMasters that can answer to preemption  efficiently (e.g., by 
 saving their intermediate state, this will be posted for MapReduce in a 
 separate JIRA soon), together with a scheduler that can issues preemption 
 requests (discussed in separate JIRAs YARN-568 and YARN-569).
 The changes we track with this JIRA are common to FairScheduler and 
 CapacityScheduler, and are mostly propagation of preemption decisions through 
 the ApplicationMastersService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Attachment: 3queues.pdf
CapScheduler_with_preemption.pdf

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Carlo Curino
 Attachments: 3queues.pdf, capacity.patch, 
 CapScheduler_with_preemption.pdf


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container first) 
 again until necessary or until no containers except the AM container are left,
 # (if not enough) it moves onto unreserve and preempt from the next 
 application. 
 # containers that have been asked to preempt are tracked across executions. 
 If a containers is among the one to be preempted for more than a certain 
 time, the container is moved in a the list of containers to be forcibly 
 killed. 
 Notes:
 (*) at the moment, in order to avoid double-counting of the requests, we only 
 look at the ANY part of pending resource requests, which means we might not 
 preempt on behalf of AMs that ask only for specific locations but not any. 
 (**) The ideal balance state is one in which each queue has at least its 
 guaranteed capacity, and the spare capacity is distributed among queues (that 
 wants some) as a weighted fair share. Where the weighting is based on the 
 guaranteed capacity of a queue, and the function runs to a fix point.  
 Tunables of the ProportionalCapacityPreemptionPolicy:
 # observe-only mode (i.e., log the actions it would take, but behave as 
 read-only)
 # how frequently to run the policy
 # how long to wait between preemption and kill of a container
 # which fraction of the containers I would like to obtain should I preempt 
 (has to do with the natural rate at which containers are returned)
 # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
 perfect balance by some small % we ignore it)
 # overall amount of preemption we can afford for each run of the policy (in 
 terms of total cluster 

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628937#comment-13628937
 ] 

Carlo Curino commented on YARN-569:
---

- Comments of attached Graphs --
The attached graph highlights the need for preemption by means of an example 
designed to highlights this. We run 2 sort jobs over 128GB of data on a 10 
nodes cluster, starting the first job in queue B (20% guaranteed capacity) and 
the second job 400sec later in queue A (80% guaranteed capacity).

We compare three scenarios:
# Default CapacityScheduler with A and B having maximum capacity set to 100%: 
the cluster utilization is high, B runs fast since it can use the entire 
cluster when A is not around, but A needs to wait for very long (almost 20 min) 
before obtaining access to its all of its guaranteed capacity (and over 250 
secs to get any container beside the AM).
# Default CapacityScheduler with A and B have maximum capacity set to 80 and 
20% respectively, A obtains its guaranteed resources immediately, but the 
cluster utilization is very low and jobs in B take over 2X longer since they 
cannot use spare overcapacity.
# CapacityScheduler + preemption: A and B are configured as in 1) but we 
preempt containers. We obtain both high-utilization, short runtimes for B 
(comparable to scenario 1), and prompt resources to A (within 30 sec). 

The second attached graph shows a scenario with 3 queues A, B, C with 40%, 20%, 
40% capacity guaranteed. We show more internals of the policy by plotting, 
instantaneous resource utilization as above, total pending request, guaranteed 
capacity, ideal assignment of memory, ideal preemption, actual preemption.
 
Things to note:
# The idealized memory assignment and instaneous resource utilization are very 
close to each other, i.e., the combination of CapacityScheduler+Preemption 
tightly follows the the ideal distribution of resources
# When only one job is running it gets 100% of the cluster, when B, A are 
running they get 33% and 66% each (which is a fair overcapacity assignment from 
their 20%, 40% guaranteed capacity), when all three jobs are running (and they 
want at least their capacity worth of resources) they obtain their guaranteed 
capacity.
#actual preemption is a fraction of ideal preemption, this is because we 
account for natural completion of tasks (with a configurable parameter)
#in this experiment we do not bound the total amount of preemption per round 
(i.e., parameter set to 1.0)
 




 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Carlo Curino
 Attachments: 3queues.pdf, capacity.patch, 
 CapScheduler_with_preemption.pdf


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending 

[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Assignee: Carlo Curino

 FairScheduler: support for work-preserving preemption 
 --

 Key: YARN-568
 URL: https://issues.apache.org/jira/browse/YARN-568
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: fair.patch


 In the attached patch, we modified  the FairScheduler to substitute its 
 preemption-by-killling with a work-preserving version of preemption (followed 
 by killing if the AMs do not respond quickly enough). This should allows to 
 run preemption checking more often, but kill less often (proper tuning to be 
 investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-569:
--

Assignee: Carlo Curino

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: 3queues.pdf, capacity.patch, 
 CapScheduler_with_preemption.pdf


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container first) 
 again until necessary or until no containers except the AM container are left,
 # (if not enough) it moves onto unreserve and preempt from the next 
 application. 
 # containers that have been asked to preempt are tracked across executions. 
 If a containers is among the one to be preempted for more than a certain 
 time, the container is moved in a the list of containers to be forcibly 
 killed. 
 Notes:
 (*) at the moment, in order to avoid double-counting of the requests, we only 
 look at the ANY part of pending resource requests, which means we might not 
 preempt on behalf of AMs that ask only for specific locations but not any. 
 (**) The ideal balance state is one in which each queue has at least its 
 guaranteed capacity, and the spare capacity is distributed among queues (that 
 wants some) as a weighted fair share. Where the weighting is based on the 
 guaranteed capacity of a queue, and the function runs to a fix point.  
 Tunables of the ProportionalCapacityPreemptionPolicy:
 # observe-only mode (i.e., log the actions it would take, but behave as 
 read-only)
 # how frequently to run the policy
 # how long to wait between preemption and kill of a container
 # which fraction of the containers I would like to obtain should I preempt 
 (has to do with the natural rate at which containers are returned)
 # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
 perfect balance by some small % we ignore it)
 # overall amount of preemption we can afford for each run of the policy (in 
 terms of total cluster capacity)
 In our 

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628938#comment-13628938
 ] 

Alejandro Abdelnur commented on YARN-45:


I'm just trying to see if we can have (at least for now) a single message type 
instead of two that satisfies the usecases. Regarding keeping the tighter 
semantics, if not difficult/complex, I'm OK with it. Thanks.

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-495) Change NM behavior of reboot to resync

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628944#comment-13628944
 ] 

Hudson commented on YARN-495:
-

Integrated in Hadoop-Mapreduce-trunk #1396 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/])
YARN-495. Changed NM reboot behaviour to be a simple resync - kill all 
containers  and re-register with RM. Contributed by Jian He. (Revision 1466752)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466752
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeAction.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java


 Change NM behavior of reboot to resync
 --

 Key: YARN-495
 URL: https://issues.apache.org/jira/browse/YARN-495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch, 
 YARN-495.4.patch, YARN-495.5.patch, YARN-495.6.patch


 When a reboot command is sent from RM, the node manager doesn't clean up the 
 containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628949#comment-13628949
 ] 

Hudson commented on YARN-487:
-

Integrated in Hadoop-Mapreduce-trunk #1396 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/])
YARN-487. Modify path manipulation in LocalDirsHandlerService to let 
TestDiskFailures pass on Windows. Contributed by Chris Nauroth. (Revision 
1466746)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466746
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java


 TestDiskFailures fails on Windows due to path mishandling
 -

 Key: YARN-487
 URL: https://issues.apache.org/jira/browse/YARN-487
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-487.1.patch


 {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
 extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
 on Windows.  The test assertions also fail to account for the fact that 
 {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628945#comment-13628945
 ] 

Hudson commented on YARN-539:
-

Integrated in Hadoop-Mapreduce-trunk #1396 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1396/])
YARN-539. Addressed memory leak of LocalResource objects NM when a resource 
localization fails. Contributed by Omkar Vinit Joshi. (Revision 1466756)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466756
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/event/ResourceFailedLocalizationEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


 LocalizedResources are leaked in memory in case resource localization fails
 ---

 Key: YARN-539
 URL: https://issues.apache.org/jira/browse/YARN-539
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-539-20130410.1.patch, yarn-539-20130410.2.patch, 
 yarn-539-20130410.patch


 If resource localization fails then resource remains in memory and is
 1) Either cleaned up when next time cache cleanup runs and there is space 
 crunch. (If sufficient space in cache is available then it will remain in 
 memory).
 2) reused if LocalizationRequest comes again for the same resource.
 I think when resource localization fails then that event should be sent to 
 LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628950#comment-13628950
 ] 

Carlo Curino commented on YARN-45:
--

Agreed on a single message, where the semantics is:
1) if both SetContainerID and ResourceRequest are specified, than it is what 
said (they overlap and you have to give me back at least the resources I ask 
otherwise these containers are at risk to getting killed)
2) if only SetContinaerID is specified is the stricter semantics of I want 
these containers back and nothing else.
3) if only ResourceRequest is specified the semantics is please give me back 
this many resources without binding what containers are at risk (this might be 
good for policies that do not want to think about containers unless it is 
really time to kill them).

Does this work for you? Seems to capture the combination of what we proposed so 
far.

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629070#comment-13629070
 ] 

Alejandro Abdelnur commented on YARN-45:


sounds good

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-563) Add application type to ApplicationReport

2013-04-11 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-563:


Issue Type: Sub-task  (was: Improvement)
Parent: YARN-386

 Add application type to ApplicationReport 
 --

 Key: YARN-563
 URL: https://issues.apache.org/jira/browse/YARN-563
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Thomas Weise

 This field is needed to distinguish different types of applications (app 
 master implementations). For example, we may run applications of type XYZ in 
 a cluster alongside MR and would like to filter applications by type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629147#comment-13629147
 ] 

Omkar Vinit Joshi commented on YARN-547:


There are couple of invalid transition for LocalizedResource now. Updating them 
as a part of this patch
* From INIT state
** From INIT to INIT on RELEASE event. This is not possible now as new resource 
is created in INIT state on REQUEST event and immediately moved to DOWNLOADING 
state. With the [yarn-539|https://issues.apache.org/jira/browse/YARN-539] fix 
now the resource will never ever move back from LOCALIZED or DOWNLOADING state 
to INIT state.
** From INIT to LOCALIZED on LOCALIZED event. This too is impossible to occur 
now.
* From DOWNLOADING state
** From DOWNLOADING to DOWNLOADING on REQUEST event. Updating the transition. 
Earlier it was starting one more localization. Now just adding the requesting 
container to the LocalizedResource container list.
* From LOCALIZED state
** Resource will never get LOCALIZED event in LOCALIZED state. removing it. 
Earlier this was possible as there were multiple downloads for the same 
resource. Now this is not possible.

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-559) Make all YARN API and libraries available through an api jar

2013-04-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned YARN-559:
--

Assignee: Vinod Kumar Vavilapalli

 Make all YARN API and libraries available through an api jar
 

 Key: YARN-559
 URL: https://issues.apache.org/jira/browse/YARN-559
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Vinod Kumar Vavilapalli

 This should be the dependency for interacting with YARN and would prevent 
 unnecessary leakage of other internal stuff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629231#comment-13629231
 ] 

Xuan Gong commented on YARN-457:


First of all, I think the changes will be AllocationResponsePBImpl, there is no 
AMResponsePBImpl anymore. Could you update to the lastest trunk version, please 
?
I think we need to change the whole setUpdatedNodes function definition, Only 
changing the if block is not enough. The whole change may like this way:
if(updatedNodes == null) {
   return
}
initLocalNewNodeReportList();
this.updatedNodes.add(updatedNodes);

The way we implement the setUpdatedNodes is just like we are implementing the 
setAllocatedContainers() in AllocationResponsePBImpl



 Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
 

 Key: YARN-457
 URL: https://issues.apache.org/jira/browse/YARN-457
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
Priority: Minor
  Labels: Newbie
 Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457.patch


 {code}
 if (updatedNodes == null) {
   this.updatedNodes.clear();
   return;
 }
 {code}
 If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629256#comment-13629256
 ] 

Vinod Kumar Vavilapalli commented on YARN-486:
--

I committed this to trunk, it isn't merging into branch-2 though, can you 
please check?

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
 YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
 YARN-486.6.patch


 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629265#comment-13629265
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-trunk-Commit #3596 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3596/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 

[jira] [Commented] (YARN-563) Add application type to ApplicationReport

2013-04-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629204#comment-13629204
 ] 

Hitesh Shah commented on YARN-563:
--

+1 on the suggestion. If you are working on this, a few comments: 

  - applicationType should also be part of ApplicationSubmissionContext
  - command-line tool to list applications (bin/yarn tool) should support 
filtering based on type
  - type should be a string

 Add application type to ApplicationReport 
 --

 Key: YARN-563
 URL: https://issues.apache.org/jira/browse/YARN-563
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Thomas Weise

 This field is needed to distinguish different types of applications (app 
 master implementations). For example, we may run applications of type XYZ in 
 a cluster alongside MR and would like to filter applications by type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Description: 
Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
max-attempts accounts for both AM level failures as well as container crashes 
due to localization issue, lost nodes etc. To account for AM crashes due to 
problems that are not caused by user code, mainly lost nodes, we want to give 
AMs some retires.

I propose we change it to atleast two. Can change it to 4 to match other 
retry-configs.

  was:
Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries 
accounts for both AM level failures as well as container crashes due to 
localization issue, lost nodes etc. To account for AM crashes due to problems 
that are not caused by user code, mainly lost nodes, we want to give AMs some 
retires.

I propose we change it to atleast two. Can change it to 4 to match other 
retry-configs.


 Change the default global AM max-attempts value to be not one
 -

 Key: YARN-542
 URL: https://issues.apache.org/jira/browse/YARN-542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
 max-attempts accounts for both AM level failures as well as container crashes 
 due to localization issue, lost nodes etc. To account for AM crashes due to 
 problems that are not caused by user code, mainly lost nodes, we want to give 
 AMs some retires.
 I propose we change it to atleast two. Can change it to 4 to match other 
 retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Summary: Change the default global AM max-attempts value to be not one  
(was: Change the default AM retry value to be not one)

 Change the default global AM max-attempts value to be not one
 -

 Key: YARN-542
 URL: https://issues.apache.org/jira/browse/YARN-542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries 
 accounts for both AM level failures as well as container crashes due to 
 localization issue, lost nodes etc. To account for AM crashes due to problems 
 that are not caused by user code, mainly lost nodes, we want to give AMs some 
 retires.
 I propose we change it to atleast two. Can change it to 4 to match other 
 retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629335#comment-13629335
 ] 

Xuan Gong commented on YARN-486:


Can not merge into branch-2, because There is no such test case 
TestFairScheduler:testNotAllowSubmitApplication in branch which is introduced 
by YARN-319, and look like that this patch is not submitted to branch-2

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
 YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
 YARN-486.6.patch


 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629377#comment-13629377
 ] 

Xuan Gong commented on YARN-441:


Add the void setServiceResponse(String key, ByteBuffer value) back to 
StartContainerResponse interface.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-441:
---

Attachment: YARN-441.3.patch

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629414#comment-13629414
 ] 

Hadoop QA commented on YARN-441:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578280/YARN-441.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/717//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/717//console

This message is automatically generated.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-542) Change the default global AM max-attempts value to be not one

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-542:
-

Attachment: YARN-542.1.patch

I've drafted a patch, which includes the following modifications:

1. Change the default value of yarn.resourcemanager.am.max-attempts from 1 to 2.

2. In the test cases, where more than one attempt is set, 
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS is used instead of the hard-coding 
values.

3. Assert the set maxAttempts  1 where one and more than one will make 
difference.

 Change the default global AM max-attempts value to be not one
 -

 Key: YARN-542
 URL: https://issues.apache.org/jira/browse/YARN-542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-542.1.patch


 Today, the global AM max-attempts is set to 1 which is a bad choice. AM 
 max-attempts accounts for both AM level failures as well as container crashes 
 due to localization issue, lost nodes etc. To account for AM crashes due to 
 problems that are not caused by user code, mainly lost nodes, we want to give 
 AMs some retires.
 I propose we change it to atleast two. Can change it to 4 to match other 
 retry-configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-11 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-319:
-

Fix Version/s: (was: 2.0.3-alpha)
   2.0.5-beta

Even though the fix version is set to 2.0.3, it isn't merged into branch-2 at 
all. I just merged it into 2.0.5-beta, and changing the fix version.

 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629467#comment-13629467
 ] 

Omkar Vinit Joshi commented on YARN-547:


Fix details :-
* Underlying problem:- Resource was getting requested even when it is in 
DOWNLOADING state for ResourceRequestEvent.
* Solution :- Fixing unwanted transition and for RequestEvent in DOWNLOADING 
state just adds container in the waiting queue.
* Tests :- Making sure that resource never moves back to INIT state even when 
requesting container releases it before localization. In case of Release event 
when resource is in DOWNLOADING state just updates container list(ref).

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130411.patch

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629475#comment-13629475
 ] 

Xuan Gong commented on YARN-486:


Another issue is at YARN-488 which is not committed into branch-2, either.
It do the changes 

 ContainerLaunchContext amContainer = BuilderUtils
 .newContainerLaunchContext(null, testUser, BuilderUtils
 .newResource(1024, 1), Collections.String, 
LocalResourceemptyMap(),
-new HashMapString, String(), Arrays.asList(sleep, 100),
+new HashMapString, String(), cmd,
 new HashMapString, ByteBuffer(), null,
 new HashMapApplicationAccessType, String());

At TestContainerManagerSecurity:submitAndRegisterApplication
 

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
 YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
 YARN-486.6.patch


 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629510#comment-13629510
 ] 

Hadoop QA commented on YARN-547:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578295/yarn-547-20130411.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalizedResource

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/719//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/719//console

This message is automatically generated.

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629514#comment-13629514
 ] 

Hadoop QA commented on YARN-486:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578308/YARN-486.6.branch2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/720//console

This message is automatically generated.

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-486.1.patch, YARN-486-20130410.txt, 
 YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch, YARN-486.5.patch, 
 YARN-486.6.branch2.patch, YARN-486.6.patch


 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629549#comment-13629549
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-trunk-Commit #3603 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3603/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629569#comment-13629569
 ] 

Omkar Vinit Joshi commented on YARN-547:


Failed test is actually testing Now invalid transitions. Fixing it.

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130411.1.patch

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: (was: yarn-547-20130411.1.patch)

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-11 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130411.1.patch

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-45:
---

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-386

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629620#comment-13629620
 ] 

Bikas Saha commented on YARN-45:


All API changes at this point are being tracked under YARN-386

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629635#comment-13629635
 ] 

Karthik Kambatla commented on YARN-45:
--

Great discussion, glad to see this coming along well. Carlo's latest comment 
makes sense to me.

Let me know if I understand it right: ResourceRequest part of the message can 
capture locality, the AM will try to give back Resources on each node as per 
this locality information?

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629638#comment-13629638
 ] 

Karthik Kambatla commented on YARN-45:
--

[~bikassaha], shouldn't this be under YARN-397?

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-567:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

 RM changes to support preemption for FairScheduler and CapacityScheduler
 

 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: common.patch


 A common tradeoff in scheduling jobs is between keeping the cluster busy and 
 enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
 takes opposite stance on how to achieve this. 
 The FairScheduler, leverages task-killing to quickly reclaim resources from 
 currently running jobs and redistributing them among new jobs, thus keeping 
 the cluster busy but waste useful work. The CapacityScheduler is typically 
 tuned
 to limit the portion of the cluster used by each queue so that the likelihood 
 of violating capacity is low, thus never wasting work, but risking to keep 
 the cluster underutilized or have jobs waiting to obtain their rightful 
 capacity. 
 By introducing the notion of a work-preserving preemption we can remove this 
 tradeoff.  This requires a protocol for preemption (YARN-45), and 
 ApplicationMasters that can answer to preemption  efficiently (e.g., by 
 saving their intermediate state, this will be posted for MapReduce in a 
 separate JIRA soon), together with a scheduler that can issues preemption 
 requests (discussed in separate JIRAs YARN-568 and YARN-569).
 The changes we track with this JIRA are common to FairScheduler and 
 CapacityScheduler, and are mostly propagation of preemption decisions through 
 the ApplicationMastersService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-568:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

 FairScheduler: support for work-preserving preemption 
 --

 Key: YARN-568
 URL: https://issues.apache.org/jira/browse/YARN-568
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: fair.patch


 In the attached patch, we modified  the FairScheduler to substitute its 
 preemption-by-killling with a work-preserving version of preemption (followed 
 by killing if the AMs do not respond quickly enough). This should allows to 
 run preemption checking more often, but kill less often (proper tuning to be 
 investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629660#comment-13629660
 ] 

Carlo Curino commented on YARN-45:
--

[~kkambatl], yes ResourceRequests can be used to capture locality preferences. 
In our first use we focus on capacity, so the RM policies are not very 
picky/aware of location, but we think it is good to build this into the 
protocol for later use (as commented above somewhere). 

(As for the last comment: we moved YARN-567, YARN-568, YARN-569 that will use 
this protocol into YARN-397, while this one is probably part of YARN-386).

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-45:
---

Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-386)

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-45:
---

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-397

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: YARN-45.patch

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: (was: YARN-45.patch)

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: YARN-45.patch

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: (was: YARN-45.patch)

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-45:
-

Attachment: YARN-45.patch

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629707#comment-13629707
 ] 

Hadoop QA commented on YARN-45:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578339/YARN-45.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/723//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/723//console

This message is automatically generated.

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-482) FS: Extend SchedulingMode to intermediate queues

2013-04-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-482:
--

Attachment: yarn-482.patch

Here is a preliminary patch that
# Renames SchedulingMode to SchedulingPolicy, as policy seems to be more apt 
name
# Extends setting SchedulingPolicy to intermediate queues
# Fixes previously broken assignContainer() hierarchy to include intermediate 
queues

 FS: Extend SchedulingMode to intermediate queues
 

 Key: YARN-482
 URL: https://issues.apache.org/jira/browse/YARN-482
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-482.patch


 FS allows setting {{SchedulingMode}} for leaf queues. Extending this to 
 non-leaf queues allows using different kinds of fairness: e.g., root can have 
 three child queues - fair-mem, drf-cpu-mem, drf-cpu-disk-mem taking different 
 number of resources into account. In turn, this allows users to decide on the 
 scheduling latency vs sophistication of the scheduling mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629742#comment-13629742
 ] 

Bikas Saha commented on YARN-514:
-

Looks good overall. Minor tab issues in the patch.

I dont think we want to change the enum values in the proto.

Please prepare a MAPREDUCE side patch for MAPREDUCE-5140. These need to go in 
together.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629745#comment-13629745
 ] 

Bikas Saha commented on YARN-514:
-

For MAPREDUCE-5140 please check for uses of both NEW and SUBMITTED in order to 
find out places where NEW_SAVING would need to be handled.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629782#comment-13629782
 ] 

Zhijie Shen commented on YARN-514:
--

@Biksa, the enum values in the proto needs to be changed because 
YarnApplicationStateProto will be used by application report. MR may also need 
it when doing state conversion from Yarn state to MR state.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.4.patch

Fix the incorrect indents.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-11 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629806#comment-13629806
 ] 

Carlo Curino commented on YARN-45:
--

Note: we don't have tests as there are no tests for the rest of the 
protocolbuffer messages either (this would consist in validating mostly 
auto-generated code).  

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629819#comment-13629819
 ] 

Xuan Gong commented on YARN-441:


Patch3 self Review:
1. For each record API, we should only have getter and setter. We can keep 
getter and setter which get or take the whole list
2. For the functions which get, set, remove one item from the whole list or 
addAll, removeAll, clear the whole list, we can simply get the whole list 
first, then do the following get, set, remove or clear actions. So, those 
functions can be removed.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629818#comment-13629818
 ] 

Hadoop QA commented on YARN-514:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578361/YARN-514.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/724//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/724//console

This message is automatically generated.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs

2013-04-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-441:
---

Attachment: YARN-441.4.patch

create new patch based on the self-review comments on patch3

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629827#comment-13629827
 ] 

Xuan Gong commented on YARN-457:


Also need add this.updatedNodes.clear() before we actually add all the 
updatedNodes

 Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
 

 Key: YARN-457
 URL: https://issues.apache.org/jira/browse/YARN-457
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
Priority: Minor
  Labels: Newbie
 Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457.patch


 {code}
 if (updatedNodes == null) {
   this.updatedNodes.clear();
   return;
 }
 {code}
 If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-561:
--

Assignee: Xuan Gong  (was: Omkar Vinit Joshi)

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira