date:20130814

[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740752#comment-13740752
 ] 

Hadoop QA commented on YARN-1044:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598163/yarn-1044.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1722//console

This message is automatically generated.

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1044:
--

Attachment: yarn-1044.patch

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1044:
--

Attachment: (was: yarn-1044.patch)

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740745#comment-13740745
 ] 

Hadoop QA commented on YARN-1044:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598160/yarn-1044.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1721//console

This message is automatically generated.

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1044:
--

Attachment: yarn-1044.patch

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1044:
--

Attachment: (was: yarn-1044.patch)

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type

2013-08-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740735#comment-13740735
 ] 

Bikas Saha commented on YARN-1004:
--

IMO the intent of yarn.scheduler.max is to be an admin setting that restricts 
how much resource can be given to any one container. Its exposed via public 
YARN API. All schedulers are supposed to enforce the admin value and not 
determine it for themselves. Hence I dont think it should be scheduler specific.
yarn.scheduler.min is scheduler internal logic on how it simplifies the 
bin-packing problem. Both current schedulers use it and its not exposed by any 
YARN API because its of no use to the user. We may split it to be scheduler 
specific but do we really see either scheduler not using it in the foreseeable 
future? Perhaps we are causing more grief than good by splitting them.
increment-allocation-mb is used only in the fair scheduler. lets just rename 
that to be scheduler specific.


> yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed 
> with the scheduler type 
> --
>
> Key: YARN-1004
> URL: https://issues.apache.org/jira/browse/YARN-1004
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-1004.patch
>
>
> As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific 
> configuration, and functions differently for the Fair and Capacity 
> schedulers, it would be less confusing for the config names to include the 
> scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, 
> yarn.scheduler.capacity.minimum-allocation-mb, and 
> yarn.scheduler.fifo.minimum-allocation-mb.
> The same goes for yarn.scheduler.increment-allocation-mb, which only exists 
> for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for 
> consistency.
> If we wish to preserve backwards compatibility, we can deprecate the old 
> configs to the new ones. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1048) Add new AMRMClientAsync.getMatchingRequests method taking a Container as parameter

2013-08-14 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740731#comment-13740731
 ] 

Hitesh Shah commented on YARN-1048:
---

[~josephkniest] That would be org.apache.hadoop.yarn.api.records.Container. 
General information on the api package - it will be restricted to classes 
within the api layer and nothing from other server-side/impl packages.

> Add new AMRMClientAsync.getMatchingRequests method taking a Container as 
> parameter
> --
>
> Key: YARN-1048
> URL: https://issues.apache.org/jira/browse/YARN-1048
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>
> The current method signature {{getMatchingRequests(Priority priority, String 
> resourceName, Resource resource)}} for using within 
> {{onContainersAllocated(List containers)}} as we have to 
> deconstruct the info from the received containers.
> A new signature, {{getMatchingRequests(Container container)}} would simplify 
> usage for clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1048) Add new AMRMClientAsync.getMatchingRequests method taking a Container as parameter

2013-08-14 Thread Joseph Kniest (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740729#comment-13740729
 ] 

Joseph Kniest commented on YARN-1048:
-

Hi, new to hadoop. In beginning to solve this I'm trying to look up the 
Container class but there are a couple that have 'yarn' in their 
namespace/classpath. There's also an interface called that. Can you provide me 
with the full classpath of the Container class/interface we want to be the 
parameter in the aforementioned function?

> Add new AMRMClientAsync.getMatchingRequests method taking a Container as 
> parameter
> --
>
> Key: YARN-1048
> URL: https://issues.apache.org/jira/browse/YARN-1048
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>
> The current method signature {{getMatchingRequests(Priority priority, String 
> resourceName, Resource resource)}} for using within 
> {{onContainersAllocated(List containers)}} as we have to 
> deconstruct the info from the received containers.
> A new signature, {{getMatchingRequests(Container container)}} would simplify 
> usage for clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken

2013-08-14 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740694#comment-13740694
 ] 

Hitesh Shah commented on YARN-1006:
---

[~vinodkv] [~xgong] Is this a blocker for 2.1.0?

> Nodes list web page on the RM web UI is broken
> --
>
> Key: YARN-1006
> URL: https://issues.apache.org/jira/browse/YARN-1006
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Xuan Gong
> Attachments: YARN-1006.1.patch
>
>
> The nodes web page which list all the connected nodes of the cluster is 
> broken.
> 1. The page is not showing in correct format/style.
> 2. If we restart the NM, the node list is not refreshed, but just add the new 
> started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740691#comment-13740691
 ] 

Hadoop QA commented on YARN-1006:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598150/YARN-1006.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1720//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1720//console

This message is automatically generated.

> Nodes list web page on the RM web UI is broken
> --
>
> Key: YARN-1006
> URL: https://issues.apache.org/jira/browse/YARN-1006
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Xuan Gong
> Attachments: YARN-1006.1.patch
>
>
> The nodes web page which list all the connected nodes of the cluster is 
> broken.
> 1. The page is not showing in correct format/style.
> 2. If we restart the NM, the node list is not refreshed, but just add the new 
> started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken

2013-08-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740673#comment-13740673
 ] 

Xuan Gong commented on YARN-1006:
-

Trivial patch. No tests added

> Nodes list web page on the RM web UI is broken
> --
>
> Key: YARN-1006
> URL: https://issues.apache.org/jira/browse/YARN-1006
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Xuan Gong
> Attachments: YARN-1006.1.patch
>
>
> The nodes web page which list all the connected nodes of the cluster is 
> broken.
> 1. The page is not showing in correct format/style.
> 2. If we restart the NM, the node list is not refreshed, but just add the new 
> started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1006) Nodes list web page on the RM web UI is broken

2013-08-14 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1006:


Attachment: YARN-1006.1.patch

The reason why the page is not showing in correct format/style is because at 
YARN-686, we flattened the nodeReport, deleted Health-status from the 
nodeReport, but we did not update the column index at the nodesTableInit 
function. 
After update the column index, we should fix the issue.  

> Nodes list web page on the RM web UI is broken
> --
>
> Key: YARN-1006
> URL: https://issues.apache.org/jira/browse/YARN-1006
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Xuan Gong
> Attachments: YARN-1006.1.patch
>
>
> The nodes web page which list all the connected nodes of the cluster is 
> broken.
> 1. The page is not showing in correct format/style.
> 2. If we restart the NM, the node list is not refreshed, but just add the new 
> started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type

2013-08-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1004:
-

Fix Version/s: 2.1.0-beta

setting the fix version to 2.1.0-beta so we don't missed before cutting the RC

> yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed 
> with the scheduler type 
> --
>
> Key: YARN-1004
> URL: https://issues.apache.org/jira/browse/YARN-1004
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-1004.patch
>
>
> As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific 
> configuration, and functions differently for the Fair and Capacity 
> schedulers, it would be less confusing for the config names to include the 
> scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, 
> yarn.scheduler.capacity.minimum-allocation-mb, and 
> yarn.scheduler.fifo.minimum-allocation-mb.
> The same goes for yarn.scheduler.increment-allocation-mb, which only exists 
> for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for 
> consistency.
> If we wish to preserve backwards compatibility, we can deprecate the old 
> configs to the new ones. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740657#comment-13740657
 ] 

Alejandro Abdelnur commented on YARN-1064:
--

The ones that caught my eye are:

{code}
YARN_PREFIX + "scheduler.minimum-allocation-mb";

YARN_PREFIX + "scheduler.minimum-allocation-vcores";

YARN_PREFIX + "scheduler.maximum-allocation-mb";

YARN_PREFIX + "scheduler.maximum-allocation-vcores";

RM_PREFIX + "scheduler.client.thread-count";

RM_PREFIX + "scheduler.monitor.enable";

RM_PREFIX + "scheduler.monitor.policies";
{code}

YARN-1004 would take care of the first 2. What about the last 3, are they a 
false positive from my side and it is OK they say in the RM? If so, we can 
close this as invalid.

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740650#comment-13740650
 ] 

Alejandro Abdelnur commented on YARN-1055:
--

[~bikassaha], [~vinodkv], in Hadoop 1 because the RM and MRAM logic are done by 
a single component, the JT, there is not need for this additional setting. 
Because in hadoop 2 the failure can be of the AM or the RM, we need to be able 
to detect.  This is a regression of functionality that should be addressed. I 
would use the same argument being used in MAPREDUCE-5311 in favor of keeping 
around functionality from Hadoop 1, that users rely on in Hadoop 2. Eventually 
Oozie and component clients will evolve to fully leverage Yarn capabilities, 
but it will take a while, we have to give a hand and provide stop gaps.

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740624#comment-13740624
 ] 

Hudson commented on YARN-1056:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4263 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4263/])
YARN-1056. Remove dual use of string 'resourcemanager' in 
yarn.resourcemanager.connect.{max.wait.secs|retry_interval.secs}. Contributed 
by Karthik Kambatla. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514135)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> Fix configs 
> yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
> 
>
> Key: YARN-1056
> URL: https://issues.apache.org/jira/browse/YARN-1056
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Trivial
>  Labels: conf
> Fix For: 2.1.0-beta
>
> Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch
>
>
> Fix configs 
> yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
>  to have a *resourcemanager* only once, make them consistent with other such 
> yarn configs and add entries in yarn-default.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}

2013-08-14 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1056:


Priority: Trivial  (was: Blocker)

> Fix configs 
> yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
> 
>
> Key: YARN-1056
> URL: https://issues.apache.org/jira/browse/YARN-1056
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Trivial
>  Labels: conf
> Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch
>
>
> Fix configs 
> yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
>  to have a *resourcemanager* only once, make them consistent with other such 
> yarn configs and add entries in yarn-default.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740601#comment-13740601
 ] 

Vinod Kumar Vavilapalli commented on YARN-1064:
---

Can you list the specific changes that you are proposing? Asking as some of the 
configs that are common to all schedulers are termed RM configs, so..

BTW, I did think of fixing all config names, but felt it was too late. If 
possible, we should avoid it. If only we pay more attention with reviews, we 
wouldn't be needing these major configuration name surgeries. I'm leaning 
towards keeping the names as they are, instead of changing them now and 
creating lots of confusion. And request everyone to +1 patches with config 
names with more care - we should definitely have a config name guide.

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type

2013-08-14 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740597#comment-13740597
 ] 

Vinod Kumar Vavilapalli commented on YARN-1004:
---

Like I mentioned, Fifo and CS both already depend on it and have the same 
meaning. Only FairScheduler diverges in meaning. Adding a new tag in the name 
is akin to adding more description. Let's not break the minimum config now, 
particularly given MAPREDUCE-5311 's dependency on a min-allocation config.

> yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed 
> with the scheduler type 
> --
>
> Key: YARN-1004
> URL: https://issues.apache.org/jira/browse/YARN-1004
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Priority: Blocker
> Attachments: YARN-1004.patch
>
>
> As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific 
> configuration, and functions differently for the Fair and Capacity 
> schedulers, it would be less confusing for the config names to include the 
> scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, 
> yarn.scheduler.capacity.minimum-allocation-mb, and 
> yarn.scheduler.fifo.minimum-allocation-mb.
> The same goes for yarn.scheduler.increment-allocation-mb, which only exists 
> for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for 
> consistency.
> If we wish to preserve backwards compatibility, we can deprecate the old 
> configs to the new ones. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740576#comment-13740576
 ] 

Vinod Kumar Vavilapalli commented on YARN-1055:
---

Same here :) We do really understand the underlying issue, was just trying to 
converge on the correct solution. To summarize
 - For the restart-case, work preserving case solves the problem of not killing 
AM unnecessarily
 - For node failures or AM crashing, we already have a knob. To avoid split 
brain issues, oozie/Pig/Hive all need to implement restartability for their 
launchers.

Given the later isn't coming in a rush, we should make oozie set max-attempts 
to 1 for the launcher.

Regarding the question of dependent AMs, dependent or not, YARN will only 
restart AM by AM. If the apps care, they need to implement recoverability.

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740500#comment-13740500
 ] 

Bikas Saha commented on YARN-1055:
--

Thats exactly what I was trying to say earlier. That RM restart is not creating 
a new problem here and we dont need a restart specific config. Restart specific 
case will be solved when we put in work-preserving restart shortly. The generic 
problem still exists and needs to be fixed in Oozie because only Oozie knows 
which parts of the pipeline can be restarted and which cannot.

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740495#comment-13740495
 ] 

Karthik Kambatla commented on YARN-1055:


Thinking more about it, the issue is not limited to RM failure. This happens 
even in the case where a node running the launcher goes down. The underlying 
issue seems to be in handling the dependency between AMs and wanting to 
tolerate failures of some of these AMs and not others.

Given that adding the config won't solve the issue completely, I agree that it 
is not a good idea to fix it for RM restart alone. Thanks Bikas, Vinod, Hitesh, 
Alejandro for the detailed discussion.

The issue, however, exists with dependent AMs and need to be handled - may be 
in Ooize for now? In the long term, would it make any sense for YARN to support 
inter-dependent AMs?


> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740474#comment-13740474
 ] 

Bikas Saha commented on YARN-1055:
--

Why does the launcher not retry the action? Is there a jira in OOZIE to make it 
work properly in such cases by doing its own book-keeping? Isnt it more correct 
to fix OOZIE instead of adding a workaround config in YARN?
Is the current situation acceptable as a known short term bug? From what I see 
nothing wrong will happen functionally/practically. In infrequent cases of the 
action-AM node crashing, the pipeline would have to be restarted. We have a 
design for work-preserving RM restart that can be completed post beta. This 
will remove the need to restart AM's. Given that, I am really averse to adding 
a short term work around API in AppSubmissionContext that will have to be 
maintained till YARN-3.0 comes out because we are guaranteeing API's post beta.

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740446#comment-13740446
 ] 

Robert Kanter commented on YARN-1055:
-

Another way of phrasing this: when the action's AM dies, we want to recover it 
(and the launcher can still monitor it with JobClient as-is), but if the action 
and launcher AMs both die due to an RM restart, we don't want to recover the 
action's AM.  Hence in the first case, we'd want the max-am-retries set to >1 
and in the second case we'd want it set to =1.  But it can't be both.

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740436#comment-13740436
 ] 

Karthik Kambatla commented on YARN-1055:


This problem doesn't exist in Hadoop-1 because JobTracker plays the role of RM 
and AM. 

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740437#comment-13740437
 ] 

Sangjin Lee commented on YARN-1044:
---

By the way, the xml issue with the capacity scheduler should be fixed, but it's 
a somewhat separate problem that would call for a different solution 
(jaxb-specific). I think it should be a separate ticket.

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740439#comment-13740439
 ] 

Hadoop QA commented on YARN-1044:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598094/yarn-1044.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1719//console

This message is automatically generated.

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740435#comment-13740435
 ] 

Karthik Kambatla commented on YARN-1055:


In Hadoop 1, we set the job.recovery.enable setting to true for the launcher 
job and false for the action job. When JT restarts, the launcher alone is 
recovered. The recovered launcher then starts the action exactly the same way 
as before.

In Hadoop 2, that translates to setting the max-am-retries to > 1 for the 
launcher job and = 1 for the action job. When RM restarts, the launcher alone 
is recovered, which restarts the action. However, if the action-AM alone dies 
(due to the node running it crashing etc.) and the launcher-AM doesn't, the 
launcher does not retry the action. IOW, the failure is ignored. 

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1044:
--

Attachment: yarn-1044.patch

Proposed patch for escaping invalid characters for html.

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png, yarn-1044.patch
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740367#comment-13740367
 ] 

Bikas Saha commented on YARN-1055:
--

How does it work in hadoop 1 then? From what I see the externally visible 
behavior of JT and RM is identical in both cases.

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740362#comment-13740362
 ] 

Karthik Kambatla commented on YARN-1055:


As in my comment from above 
(https://issues.apache.org/jira/browse/YARN-1055?focusedCommentId=13737487&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13737487),
 max-am-retries is not enough for Oozie without significant changes to how 
Oozie launcher is implemented.

To work around this, Oozie launcher will have to monitor the action and 
re-submit the action in case the action AM fails

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container

2013-08-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1065:
--

Target Version/s: 2.1.1-beta

> NM should provide AuxillaryService data to the container
> 
>
> Key: YARN-1065
> URL: https://issues.apache.org/jira/browse/YARN-1065
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.0.4-alpha
>Reporter: Bikas Saha
>
> Start container returns auxillary service data to the AM but does not provide 
> the same information to the task itself. It could add that information to the 
> container env with key=service_name and value=service_data. This allows the 
> container to start using the service without having to depend on the AM to 
> send the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container

2013-08-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1065:
--

Affects Version/s: (was: 2.0.4-alpha)

> NM should provide AuxillaryService data to the container
> 
>
> Key: YARN-1065
> URL: https://issues.apache.org/jira/browse/YARN-1065
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>
> Start container returns auxillary service data to the AM but does not provide 
> the same information to the task itself. It could add that information to the 
> container env with key=service_name and value=service_data. This allows the 
> container to start using the service without having to depend on the AM to 
> send the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container

2013-08-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1065:
--

Affects Version/s: (was: 2.1.1-beta)
   2.0.4-alpha

> NM should provide AuxillaryService data to the container
> 
>
> Key: YARN-1065
> URL: https://issues.apache.org/jira/browse/YARN-1065
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.0.4-alpha
>Reporter: Bikas Saha
>
> Start container returns auxillary service data to the AM but does not provide 
> the same information to the task itself. It could add that information to the 
> container env with key=service_name and value=service_data. This allows the 
> container to start using the service without having to depend on the AM to 
> send the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1065) NM should provide AuxillaryService data to the container

2013-08-14 Thread Bikas Saha (JIRA)

Bikas Saha created YARN-1065:


 Summary: NM should provide AuxillaryService data to the container
 Key: YARN-1065
 URL: https://issues.apache.org/jira/browse/YARN-1065
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha


Start container returns auxillary service data to the AM but does not provide 
the same information to the task itself. It could add that information to the 
container env with key=service_name and value=service_data. This allows the 
container to start using the service without having to depend on the AM to send 
the info to it indirectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1049) ContainerExistStatus and ContainerState are defined incorrectly

2013-08-14 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740265#comment-13740265
 ] 

Zhijie Shen commented on YARN-1049:
---

bq. ContainerExitStatus defines a few constant with special exit status values 
(0,-1000, -100, -101). This is incorrect, we should not define any special 
constants and limit to return the actual process exist status code.

In addition to ContainerExitStatus, ExitCode also defines 137 and 143. However, 
except these values, a container's exit code usually comes from the exit value 
of its process. Are you concerned that the value from the process may conflict 
the self defined one?

> ContainerExistStatus and ContainerState are defined incorrectly
> ---
>
> Key: YARN-1049
> URL: https://issues.apache.org/jira/browse/YARN-1049
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.1.0-beta
>
>
> ContainerExitStatus defines a few constant with special exit status values 
> (0,-1000, -100, -101). This is incorrect, we should not define any special 
> constants and limit to return the actual process exist status code.
> ContainerState should include PREEMPTED (when preempted by YARN), LOST (when 
> the NM crashes).
> With the current behavior is impossible to determine if a container has been 
> preempted or lost due to a NM crash.
> Marking it as a blocker for 2.1.0 as this is an API/behavior change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-305) Too many 'Node offerred to app:..." messages in RM

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740264#comment-13740264
 ] 

Hadoop QA commented on YARN-305:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598064/YARN-305.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1718//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1718//console

This message is automatically generated.

> Too many 'Node offerred to app:..." messages in RM
> --
>
> Key: YARN-305
> URL: https://issues.apache.org/jira/browse/YARN-305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Lohit Vijayarenu
>Priority: Minor
> Attachments: YARN-305.1.patch
>
>
> Running fair scheduler YARN shows that RM has lots of messages like the below.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
> Node offered to app: application_1357147147433_0002 reserved: false
> {noformat}
> They dont seem to tell much and same line is dumped many times in RM log. It 
> would be good to have it improved with node information or moved to some 
> other logging level with enough debug information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740249#comment-13740249
 ] 

Alejandro Abdelnur commented on YARN-1055:
--

[~rkanter], [~kkambatl], can you please see if max-am-retries is enough for 
what Oozie needs?

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740242#comment-13740242
 ] 

Bikas Saha commented on YARN-1055:
--

First of all, whatever needs to be set must be set in the AppSubmissionContext 
API for that job. Only that is job specific and this config cannot be global 
across all jobs.

By MAPREDUCE-4824 on job submission, we set a property in job conf (that is job 
specific) saying not to retry the job.
In YARN, on job submission, in the AppSubmissionContext API (that is job 
specific), we say that max-am-retries = 1.

For a job that cannot be restarted, (either due to AM crash or node crash or RM 
restart AND all these are indistinguishable wrt to the job) the per job 
max-am-retries needs to be set to 1. Its probably 2 weeks worth of work to 
remove RM restart from the above list. Even after that, such a job needs to set 
max-am-retries = 1 so that RM does not restart the job when the node crashes or 
AM crashes. Why does an rm restart related special API need to be added now?


> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740239#comment-13740239
 ] 

Zhijie Shen commented on YARN-1064:
---

It sounds good to uniform the prefix. Better to use 
"yarn.resourcemanager.scheduler"?

Shall we consider the compatibility to the early 2.x versions? Maybe we can 
deprecate, but not remove the ones beginning with YARN_PREFIX.

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740228#comment-13740228
 ] 

Hadoop QA commented on YARN-1008:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598055/YARN-1008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1717//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1717//console

This message is automatically generated.

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch, YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-305) Too many 'Node offerred to app:..." messages in RM

2013-08-14 Thread Lohit Vijayarenu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu updated YARN-305:
--

Attachment: YARN-305.1.patch

Had generated diff from old branch. Reattaching diff.

> Too many 'Node offerred to app:..." messages in RM
> --
>
> Key: YARN-305
> URL: https://issues.apache.org/jira/browse/YARN-305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Lohit Vijayarenu
>Priority: Minor
> Attachments: YARN-305.1.patch
>
>
> Running fair scheduler YARN shows that RM has lots of messages like the below.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
> Node offered to app: application_1357147147433_0002 reserved: false
> {noformat}
> They dont seem to tell much and same line is dumped many times in RM log. It 
> would be good to have it improved with node information or moved to some 
> other logging level with enough debug information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-305) Too many 'Node offerred to app:..." messages in RM

2013-08-14 Thread Lohit Vijayarenu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu updated YARN-305:
--

Attachment: (was: YARN-305.1.patch)

> Too many 'Node offerred to app:..." messages in RM
> --
>
> Key: YARN-305
> URL: https://issues.apache.org/jira/browse/YARN-305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Lohit Vijayarenu
>Priority: Minor
>
> Running fair scheduler YARN shows that RM has lots of messages like the below.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
> Node offered to app: application_1357147147433_0002 reserved: false
> {noformat}
> They dont seem to tell much and same line is dumped many times in RM log. It 
> would be good to have it improved with node information or moved to some 
> other logging level with enough debug information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740216#comment-13740216
 ] 

Sandy Ryza commented on YARN-1064:
--

I think this is only for the scheduler configs.  Do you think there is a 
fundamental difference between the ones that start with 
"yarn.resourcemanager.scheduler" and the ones that start with "yarn.scheduler".?

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740211#comment-13740211
 ] 

Omkar Vinit Joshi commented on YARN-1008:
-

+1.. lgtm

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch, YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740207#comment-13740207
 ] 

Omkar Vinit Joshi commented on YARN-1064:
-

like..RM_PREFIX..clearly means it is for RM... similarly NM_PREFIX for NM and 
YARN_PREFIX for other general stuff.. if we use common prefix then first of all 
there will be no point to have any prefix as all yarn specific configurations 
will go into yarn-site.xml and it is meant for YarnConfiguration only Let 
me know if you disagree.. 

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1064:


Labels: newbie  (was: )

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740204#comment-13740204
 ] 

Alejandro Abdelnur commented on YARN-1055:
--

[~bikassaha], 

bq. Restart on am failure is already determined by the default value of max am 
retries in yarn config. Setting that to 1 will prevent RM from restarting AM's 
on failure. Thus no need for new config. Restart after RM restart is already 
covered by setting max am retries to 1 by the app client on app submission. 

Are we talking about the same property here? if so I don't see how you can 
differentiate between AM failure and RM restart.

bq. If an app cannot handle this situation it should create its own config and 
set the correct value of 1 on submission. YARN should not add a config IMO. If 
I remember right, this config is being imported from hadoop 1 and the impl of 
this config in hadoop 1 is what RM already does to handle user defined max am 
retries.

Oozie is using MRAM for the launcher job, so it is not a question of the AM not 
handling it. The problem is that to Oozie the MR jobs started by 
hive/distcp/sqoop are opaque until the jobs complete (it is a limitation of the 
clients of these components).

With MAPREDUCE-4824, in Hadoop 1 we have specify the number of retries for a 
task (that would be equivalent to specifying the number of AM retries) and we 
can specify if a job is recoverable or not.

We need the equivalent of MAPREDUCE-4824 in Hadoop 2.

Unless I'm missing something, this is not available.


> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-305) Too many 'Node offerred to app:..." messages in RM

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740201#comment-13740201
 ] 

Hadoop QA commented on YARN-305:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598058/YARN-305.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1716//console

This message is automatically generated.

> Too many 'Node offerred to app:..." messages in RM
> --
>
> Key: YARN-305
> URL: https://issues.apache.org/jira/browse/YARN-305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Lohit Vijayarenu
>Priority: Minor
> Attachments: YARN-305.1.patch
>
>
> Running fair scheduler YARN shows that RM has lots of messages like the below.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
> Node offered to app: application_1357147147433_0002 reserved: false
> {noformat}
> They dont seem to tell much and same line is dumped many times in RM log. It 
> would be good to have it improved with node information or moved to some 
> other logging level with enough debug information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740198#comment-13740198
 ] 

Sandy Ryza commented on YARN-1008:
--

+1

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch, YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-305) Too many 'Node offerred to app:..." messages in RM

2013-08-14 Thread Lohit Vijayarenu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu updated YARN-305:
--

Attachment: YARN-305.1.patch

Simple patch to change log level to debug and add node information. I also saw 
similar case while offering node to queue, so add node information these as 
well. Could not think of test case as this is only changing loglevel

> Too many 'Node offerred to app:..." messages in RM
> --
>
> Key: YARN-305
> URL: https://issues.apache.org/jira/browse/YARN-305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Lohit Vijayarenu
>Priority: Minor
> Attachments: YARN-305.1.patch
>
>
> Running fair scheduler YARN shows that RM has lots of messages like the below.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: 
> Node offered to app: application_1357147147433_0002 reserved: false
> {noformat}
> They dont seem to tell much and same line is dumped many times in RM log. It 
> would be good to have it improved with node information or moved to some 
> other logging level with enough debug information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1008:
-

Attachment: YARN-1008.patch

missed the dot before node, fixed. also added javadocs to the MiniYARNCluster 
class indicating the use of this property and how it works.

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch, YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740178#comment-13740178
 ] 

Alejandro Abdelnur commented on YARN-1064:
--

[~ojoshi], can you please explain why is more intuitive not to be consistent?

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2013-08-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740167#comment-13740167
 ] 

Bikas Saha commented on YARN-1063:
--

Can you please provide some overall design approach. Pros cons etc.

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: trunk-win
> Environment: Windows
>Reporter: Kyle Leckie
>  Labels: security
> Fix For: trunk-win
>
> Attachments: YARN-732.patch
>
>
> Task isolation requires the ability to launch tasks in the context of a 
> particular domain user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740162#comment-13740162
 ] 

Karthik Kambatla commented on YARN-1055:


[~hitesh], you are right - we should be careful in labeling failures one way or 
the other.

We should probably classify the failures from a user-perspective and then look 
into what configs are required. At the least, I see the following different 
classes:
# Non-AM container/task failures
# AM container failures
# Bunch of (related) AMs failing due to node failures - nodes crashing or 
network partitions or RM failure.

Thoughts?

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740161#comment-13740161
 ] 

Omkar Vinit Joshi commented on YARN-1064:
-

I think this is more intuitive the way it isthan using same prefix for 
all..thoughts?

> YarnConfiguration scheduler configuration constants are not consistent
> --
>
> Key: YARN-1064
> URL: https://issues.apache.org/jira/browse/YARN-1064
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.1.0-beta
>
>
> Some of the scheduler configuration constants in YarnConfiguration have 
> RM_PREFIX and others YARN_PREFIX. For consistency we should move all under 
> the same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740112#comment-13740112
 ] 

Hadoop QA commented on YARN-1008:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598040/YARN-1008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1715//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1715//console

This message is automatically generated.

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740101#comment-13740101
 ] 

Sandy Ryza commented on YARN-1008:
--

Is there a reason for using "include-port-in-node.name" and not 
"include-port-in-node-name"? Also, would it make sense to turn it on by default 
in MiniYARNCluster?  Or put some doc there to let people know about its 
existence?

Otherwise, LGTM.

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1008:
-

Attachment: YARN-1008.patch

Addressed all comments.

Created YARN-1064 as there are some scheduler config constants that have 
different prefixes.

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, 
> YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent

2013-08-14 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created YARN-1064:


 Summary: YarnConfiguration scheduler configuration constants are 
not consistent
 Key: YARN-1064
 URL: https://issues.apache.org/jira/browse/YARN-1064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta


Some of the scheduler configuration constants in YarnConfiguration have 
RM_PREFIX and others YARN_PREFIX. For consistency we should move all under the 
same prefix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type

2013-08-14 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1004:
-

Summary: yarn.scheduler.minimum|maximum|increment-allocation-mb should be 
prefixed with the scheduler type   (was: 
yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler)

> yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed 
> with the scheduler type 
> --
>
> Key: YARN-1004
> URL: https://issues.apache.org/jira/browse/YARN-1004
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Priority: Blocker
> Attachments: YARN-1004.patch
>
>
> As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific 
> configuration, and functions differently for the Fair and Capacity 
> schedulers, it would be less confusing for the config names to include the 
> scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, 
> yarn.scheduler.capacity.minimum-allocation-mb, and 
> yarn.scheduler.fifo.minimum-allocation-mb.
> The same goes for yarn.scheduler.increment-allocation-mb, which only exists 
> for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for 
> consistency.
> If we wish to preserve backwards compatibility, we can deprecate the old 
> configs to the new ones. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user

2013-08-14 Thread Kyle Leckie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Leckie updated YARN-1063:
--

Attachment: (was: YARN-732.patch)

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: trunk-win
> Environment: Windows
>Reporter: Kyle Leckie
>  Labels: security
> Fix For: trunk-win
>
> Attachments: YARN-732.patch
>
>
> Task isolation requires the ability to launch tasks in the context of a 
> particular domain user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user

2013-08-14 Thread Kyle Leckie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Leckie updated YARN-1063:
--

Attachment: YARN-732.patch

code patch

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: trunk-win
> Environment: Windows
>Reporter: Kyle Leckie
>  Labels: security
> Fix For: trunk-win
>
> Attachments: YARN-732.patch
>
>
> Task isolation requires the ability to launch tasks in the context of a 
> particular domain user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740060#comment-13740060
 ] 

Hadoop QA commented on YARN-867:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12598022/YARN-867.1.sampleCode.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1714//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1714//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1714//console

This message is automatically generated.

> Isolation of failures in aux services 
> --
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-867.1.sampleCode.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user

2013-08-14 Thread Kyle Leckie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Leckie updated YARN-1063:
--

Attachment: YARN-732.patch

Code patch

> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: trunk-win
> Environment: Windows
>Reporter: Kyle Leckie
>  Labels: security
> Fix For: trunk-win
>
> Attachments: YARN-732.patch
>
>
> Task isolation requires the ability to launch tasks in the context of a 
> particular domain user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740048#comment-13740048
 ] 

Omkar Vinit Joshi commented on YARN-1061:
-

How can NM wait infinitely? I mean what is your connection timeout set to? can 
you add below parameters to your log4j.properties and see if actually times out 
or wait infinitely for RM... Also can attach those logs once you simulate it?
{code}
log4j.logger.org.apache.hadoop.ipc.Server=DEBUG
log4j.logger.org.apache.hadoop.ipc.Client=DEBUG
{code}

Also helpful configurations from *CommonConfigurationKeysPublic*
{code}
  public static final String  IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY =
"ipc.client.connection.maxidletime";
  /** Default value for IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY */
  public static final int IPC_CLIENT_CONNECTION_MAXIDLETIME_DEFAULT = 
1; // 10s
  /** See core-default.xml */
  public static final String  IPC_CLIENT_CONNECT_TIMEOUT_KEY =
"ipc.client.connect.timeout";
  /** Default value for IPC_CLIENT_CONNECT_TIMEOUT_KEY */
  public static final int IPC_CLIENT_CONNECT_TIMEOUT_DEFAULT = 2; // 20s
{code}


> NodeManager is indefinitely waiting for nodeHeartBeat() response from 
> ResouceManager.
> -
>
> Key: YARN-1061
> URL: https://issues.apache.org/jira/browse/YARN-1061
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Rohith Sharma K S
>
> It is observed that in one of the scenario, NodeManger is indefinetly waiting 
> for nodeHeartbeat response from ResouceManger where ResouceManger is in 
> hanged up state.
> NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-08-14 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740042#comment-13740042
 ] 

Hitesh Shah commented on YARN-867:
--

Might be good to break this down in a subset of jiras. The first ( this jira 
itself ) to just ensure that the NM does not crash. The second to address the 
proposed changes in the protocol and potential changes in the MR AM to use the 
new apis and handle failures as needed.


> Isolation of failures in aux services 
> --
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-867.1.sampleCode.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1006) Nodes list web page on the RM web UI is broken

2013-08-14 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1006:
---

Assignee: Xuan Gong  (was: Jian He)

> Nodes list web page on the RM web UI is broken
> --
>
> Key: YARN-1006
> URL: https://issues.apache.org/jira/browse/YARN-1006
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Xuan Gong
>
> The nodes web page which list all the connected nodes of the cluster is 
> broken.
> 1. The page is not showing in correct format/style.
> 2. If we restart the NM, the node list is not refreshed, but just add the new 
> started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-867) Isolation of failures in aux services

2013-08-14 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-867:
---

Attachment: YARN-867.1.sampleCode.patch

> Isolation of failures in aux services 
> --
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-867.1.sampleCode.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-867) Isolation of failures in aux services

2013-08-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740009#comment-13740009
 ] 

Xuan Gong commented on YARN-867:


My proposal:
When there is any auxService failure, instead of simply throwing out the 
exceptions to the dispatcher, we will catch them and inform the AM. 

Here is how it works:

We will use containerManagementProtocol. Basically, AM will need to send the 
AuxiliaryServiceCheckRequest with ApplicationId as parameter frequently (We can 
set the period as 3s or 5s), and we use ContainerManagementProtocol to send 
this request to all the ContainerManager that this AM knows. Then those 
ContainerManagers will send the response back with the information whether 
there is any AuxiliaryService with this appId is failed, and related 
diagnositics. 

At ContainerManagerImpl side, for all the registered  AuxServices, if any of 
them fails, instead of simp lying throwing out of the exceptions to the 
dispatcher, we will catch the exceptions, and save them with appId and 
exception message into a AuxServiceFailureMap. In that case, when one 
containerManager receives  AuxiliaryServiceCheckRequest, it can check 
AuxServiceFailureMap with the appId, and send back the response with whether 
this is any  AuxServices with this appid fails.

Attached a sample code for this proposal.

> Isolation of failures in aux services 
> --
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Critical
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1059) '\n' or ' ' or '\t' should be ignored for some configuration parameters

2013-08-14 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740005#comment-13740005
 ] 

Zhijie Shen commented on YARN-1059:
---

it's duplicate with HADOOP-9869

> '\n' or ' ' or '\t' should be ignored for some configuration parameters
> ---
>
> Key: YARN-1059
> URL: https://issues.apache.org/jira/browse/YARN-1059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
> Environment: Ubuntu 12.04, hadoop 2.0.5
>Reporter: rvller
>Priority: Minor
>  Labels: newbie
>
> Here is the traceback while starting the yarn resourse manager:
> 2013-08-12 12:53:29,319 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 
> 10.245.1.30:9030
>  (configuration property 'yarn.resourcemanager.resource-tracker.address')
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
>   at 
> org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
> And here is the yarn-site.xml:
> 
> 
> 
> yarn.resourcemanager.address
> 
> 
> 10.245.1.30:9010
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.scheduler.address
> 
> 
> 10.245.1.30:9020
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.resource-tracker.address
> 
> 
> 10.245.1.30:9030
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.admin.address
> 
> 
> 10.245.1.30:9040
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.webapp.address
> 
> 
> 10.245.1.30:9050
> 
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1059) '\n' or ' ' or '\t' should be ignored for some configuration parameters

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1059:


Summary: '\n' or ' ' or '\t' should be ignored for some configuration 
parameters  (was: IllegalArgumentException while starting YARN)

> '\n' or ' ' or '\t' should be ignored for some configuration parameters
> ---
>
> Key: YARN-1059
> URL: https://issues.apache.org/jira/browse/YARN-1059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
> Environment: Ubuntu 12.04, hadoop 2.0.5
>Reporter: rvller
>Priority: Minor
>  Labels: newbie
>
> Here is the traceback while starting the yarn resourse manager:
> 2013-08-12 12:53:29,319 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 
> 10.245.1.30:9030
>  (configuration property 'yarn.resourcemanager.resource-tracker.address')
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
>   at 
> org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
> And here is the yarn-site.xml:
> 
> 
> 
> yarn.resourcemanager.address
> 
> 
> 10.245.1.30:9010
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.scheduler.address
> 
> 
> 10.245.1.30:9020
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.resource-tracker.address
> 
> 
> 10.245.1.30:9030
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.admin.address
> 
> 
> 10.245.1.30:9040
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.webapp.address
> 
> 
> 10.245.1.30:9050
> 
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1059) IllegalArgumentException while starting YARN

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1373#comment-1373
 ] 

Omkar Vinit Joshi commented on YARN-1059:
-

Modifying title

> IllegalArgumentException while starting YARN
> 
>
> Key: YARN-1059
> URL: https://issues.apache.org/jira/browse/YARN-1059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
> Environment: Ubuntu 12.04, hadoop 2.0.5
>Reporter: rvller
>Priority: Minor
>  Labels: newbie
>
> Here is the traceback while starting the yarn resourse manager:
> 2013-08-12 12:53:29,319 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 
> 10.245.1.30:9030
>  (configuration property 'yarn.resourcemanager.resource-tracker.address')
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
>   at 
> org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
> And here is the yarn-site.xml:
> 
> 
> 
> yarn.resourcemanager.address
> 
> 
> 10.245.1.30:9010
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.scheduler.address
> 
> 
> 10.245.1.30:9020
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.resource-tracker.address
> 
> 
> 10.245.1.30:9030
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.admin.address
> 
> 
> 10.245.1.30:9040
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.webapp.address
> 
> 
> 10.245.1.30:9050
> 
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1059) IllegalArgumentException while starting YARN

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739997#comment-13739997
 ] 

Omkar Vinit Joshi commented on YARN-1059:
-

Today none of the configuration parameters read ignores '\n' or ' ' or '\t'. 
This is not very critical downgrading its priority.

> IllegalArgumentException while starting YARN
> 
>
> Key: YARN-1059
> URL: https://issues.apache.org/jira/browse/YARN-1059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
> Environment: Ubuntu 12.04, hadoop 2.0.5
>Reporter: rvller
>Priority: Minor
>  Labels: newbie
>
> Here is the traceback while starting the yarn resourse manager:
> 2013-08-12 12:53:29,319 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 
> 10.245.1.30:9030
>  (configuration property 'yarn.resourcemanager.resource-tracker.address')
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
>   at 
> org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
> And here is the yarn-site.xml:
> 
> 
> 
> yarn.resourcemanager.address
> 
> 
> 10.245.1.30:9010
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.scheduler.address
> 
> 
> 10.245.1.30:9020
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.resource-tracker.address
> 
> 
> 10.245.1.30:9030
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.admin.address
> 
> 
> 10.245.1.30:9040
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.webapp.address
> 
> 
> 10.245.1.30:9050
> 
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1059) IllegalArgumentException while starting YARN

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1059:


Labels: newbie  (was: )

> IllegalArgumentException while starting YARN
> 
>
> Key: YARN-1059
> URL: https://issues.apache.org/jira/browse/YARN-1059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
> Environment: Ubuntu 12.04, hadoop 2.0.5
>Reporter: rvller
>  Labels: newbie
>
> Here is the traceback while starting the yarn resourse manager:
> 2013-08-12 12:53:29,319 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 
> 10.245.1.30:9030
>  (configuration property 'yarn.resourcemanager.resource-tracker.address')
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
>   at 
> org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
> And here is the yarn-site.xml:
> 
> 
> 
> yarn.resourcemanager.address
> 
> 
> 10.245.1.30:9010
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.scheduler.address
> 
> 
> 10.245.1.30:9020
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.resource-tracker.address
> 
> 
> 10.245.1.30:9030
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.admin.address
> 
> 
> 10.245.1.30:9040
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.webapp.address
> 
> 
> 10.245.1.30:9050
> 
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1059) IllegalArgumentException while starting YARN

2013-08-14 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1059:


Priority: Minor  (was: Major)

> IllegalArgumentException while starting YARN
> 
>
> Key: YARN-1059
> URL: https://issues.apache.org/jira/browse/YARN-1059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
> Environment: Ubuntu 12.04, hadoop 2.0.5
>Reporter: rvller
>Priority: Minor
>  Labels: newbie
>
> Here is the traceback while starting the yarn resourse manager:
> 2013-08-12 12:53:29,319 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.lang.IllegalArgumentException: Does not contain a valid host:port 
> authority: 
> 10.245.1.30:9030
>  (configuration property 'yarn.resourcemanager.resource-tracker.address')
>   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193)
>   at 
> org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710)
> And here is the yarn-site.xml:
> 
> 
> 
> yarn.resourcemanager.address
> 
> 
> 10.245.1.30:9010
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.scheduler.address
> 
> 
> 10.245.1.30:9020
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.resource-tracker.address
> 
> 
> 10.245.1.30:9030
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.admin.address
> 
> 
> 10.245.1.30:9040
> 
> 
> 
> 
> 
> 
> yarn.resourcemanager.webapp.address
> 
> 
> 10.245.1.30:9050
> 
> 
> 
> 
> 
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1063) Winutils needs ability to create task as domain user

2013-08-14 Thread Kyle Leckie (JIRA)

Kyle Leckie created YARN-1063:
-

 Summary: Winutils needs ability to create task as domain user
 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: trunk-win
 Environment: Windows
Reporter: Kyle Leckie
 Fix For: trunk-win


Task isolation requires the ability to launch tasks in the context of a 
particular domain user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739915#comment-13739915
 ] 

Hadoop QA commented on YARN-292:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console

This message is automatically generated.

> ResourceManager throws ArrayIndexOutOfBoundsException while handling 
> CONTAINER_ALLOCATED for application attempt
> 
>
> Key: YARN-292
> URL: https://issues.apache.org/jira/browse/YARN-292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Zhijie Shen
> Attachments: YARN-292.1.patch, YARN-292.2.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
> Calling allocate on removed or non existant application 
> appattempt_1356385141279_49525_01
> 2012-12-26 08:41:15,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_ALLOCATED for applicationAttempt 
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>   at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

2013-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739913#comment-13739913
 ] 

Sandy Ryza commented on YARN-1008:
--

A few comments:

Can we call the config RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME instead of 
RM_SCHEDULER_USE_PORT_FOR_NODE_NAME?  The latter makes it seem like we're only 
using the port.

Also, like in yarn.scheduler.minimum-allocation-mb, can we use dashes, not 
periods, for the part that comes after scheduler? 

Also, it should start with yarn.scheduler, not yarn.resourcemanager.scheduler.

In the getNodeName doc, "diferentiate" should be "differentiate".

The whole test added to TestFairScheduler needs another space of indentation.

The finally block at the end of the test shouldn't be necessary, because we 
reinitialize with a fresh config before every test already.

> MiniYARNCluster with multiple nodemanagers, all nodes have same key for 
> allocations
> ---
>
> Key: YARN-1008
> URL: https://issues.apache.org/jira/browse/YARN-1008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch
>
>
> While the NMs are keyed using the NodeId, the allocation is done based on the 
> hostname. 
> This makes the different nodes indistinguishable to the scheduler.
> There should be an option to enabled the host:port instead just port for 
> allocations. The nodes reported to the AM should report the 'key' (host or 
> host:port). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739900#comment-13739900
 ] 

Sandy Ryza commented on YARN-1024:
--

bq. I would also like us to have a flag that would either limit the container 
to the requested CPU and let it have no more even when more is available, or 
would let it expand to use whatever CPU was free, but would be guaranteed to 
get at least the YCUs requested.
YARN-810 should handle this.  The plan is to make it a cluster config, but feel 
free to chime in there if you think it needs to be an app config. 

bq. 1 YCU is very complex to measure for an application.
Agreed that YCUs are very complex to measure and set for applications, and I 
don't think there is any good way around this. YARN-810 will help considerably, 
but still won't make it close to as easy as configuring memory.

bq. although I think I would change the numbers to be total YCUs requested and 
minimum YCUs per core.
Because of the complexity discussed above in dealing with YCUs, I strongly 
believe that we should keep one of the parameters as just "number of cores", 
which allows a user to separate the concerns of "how much parallelism can my 
task take advantage of?" and "how CPU-bound is my task?".  This will also give 
us something in common with every other cluster resource manager I have 
surveyed (Condor, Maui, and Torque, etc.)

> Define a virtual core unambigiously
> ---
>
> Key: YARN-1024
> URL: https://issues.apache.org/jira/browse/YARN-1024
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that 
> it's easy to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU: 
> http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
> equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-14 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-292:
-

Attachment: YARN-292.2.patch

Updated the patch to add comments and assert in AMContainerAllocatedTransition, 
to justify the number of allocated containers is not zero.

> ResourceManager throws ArrayIndexOutOfBoundsException while handling 
> CONTAINER_ALLOCATED for application attempt
> 
>
> Key: YARN-292
> URL: https://issues.apache.org/jira/browse/YARN-292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Zhijie Shen
> Attachments: YARN-292.1.patch, YARN-292.2.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
> Calling allocate on removed or non existant application 
> appattempt_1356385141279_49525_01
> 2012-12-26 08:41:15,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_ALLOCATED for applicationAttempt 
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>   at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1024) Define a virtual core unambigiously

2013-08-14 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739853#comment-13739853
 ] 

Robert Joseph Evans commented on YARN-1024:
---

{quote}Sorry for the longwindedness.{quote}

>From what people have told me you still have a long ways to go before you 
>approach me for longwindedness :).

My initial gut reaction is that only having two numbers to express the request 
seems too simplified, but the more I think about it the more I am OK with it, 
although I think I would change the numbers to be total YCUs requested and 
minimum YCUs per core.  This gives the user better viability into how the 
scheduler is treating these numbers so they can better reason about them. The 
total YCUs is the value used for scheduling.  The minimum YCUs per core is 
compared to the maxComputeUnitsPerCore like was suggested to reject a request 
as not possible, or in the case of a heterogeneous environment restrict the 
hosts that this container can run on.  Although I am OK with the original 
proposal too.

I would also like us to have a flag that would either limit the container to 
the requested CPU and let it have no more even when more is available, or would 
let it expand to use whatever CPU was free, but would be guaranteed to get at 
least the YCUs requested.  This is likely something that would have to be done 
on a separate JIRA though.  Without this I don't see a way to really get 
simplicity, predictability, or consistency.  1 MB of RAM is fairly simple to 
understand.  It can be measured without too much of a problem just by running 
the process.  Most user do a simple search for the correct value run with the 
default, if it does not work I increase the amount and run again.  1 YCU is 
very complex to measure for an application.  If I cannot restrict a container 
to never use more than what was requested I cannot consistently predict how 
long it will take to run later.  Without this I don't know how to answer the 
question I know will come up.

What should I set these values to?


> Define a virtual core unambigiously
> ---
>
> Key: YARN-1024
> URL: https://issues.apache.org/jira/browse/YARN-1024
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that 
> it's easy to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU: 
> http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the 
> equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739829#comment-13739829
 ] 

Hitesh Shah commented on YARN-1055:
---

[~kkambatl] Based on the discussion, I was trying to understand what is 
conceived as AM failure vs RM restart vs infra- failures. It seems a bit 
confusing from an app developer point of view that the AM being restarted as a 
result of the RM restarting is considered different from the AM being restarted 
because the NM went down.



> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page

2013-08-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739802#comment-13739802
 ] 

Sangjin Lee commented on YARN-1044:
---

Sounds good. I'll submit a patch soon.

> used/min/max resources do not display info in the scheduler page
> 
>
> Key: YARN-1044
> URL: https://issues.apache.org/jira/browse/YARN-1044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Sangjin Lee
>Priority: Minor
>  Labels: newbie
> Attachments: screenshot.png
>
>
> Go to the scheduler page in RM, and click any queue to display the detailed 
> info. You'll find that none of the resources entries (used, min, or max) 
> would display values.
> It is because the values contain brackets ("<" and ">") and are not properly 
> html-escaped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-451) Add more metrics to RM page

2013-08-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739801#comment-13739801
 ] 

Sangjin Lee commented on YARN-451:
--

I agree that hadoop 1 was different as the notion of mappers and reducers was 
explicit from the overview and the RM works in a different way in terms of 
resource allocation.

I am pointing out that from a user perspective there is a feature gap where one 
cannot quickly get a sense of relative sizes of apps/jobs. I also agree that 
the solution should be done in a way such that it doesn't crowd the UI and also 
conforms well with the current RM architecture. Thanks!

> Add more metrics to RM page
> ---
>
> Key: YARN-451
> URL: https://issues.apache.org/jira/browse/YARN-451
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Lohit Vijayarenu
>Priority: Minor
>
> ResourceManager webUI shows list of RUNNING applications, but it does not 
> tell which applications are requesting more resource compared to others. With 
> cluster running hundreds of applications at once it would be useful to have 
> some kind of metric to show high-resource usage applications vs low-resource 
> usage ones. At the minimum showing number of containers is good option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739711#comment-13739711
 ] 

Hudson commented on YARN-1060:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1518 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1518/])
YARN-1060. Two tests in TestFairScheduler are missing @Test annotation 
(Niranjan Singh via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> Two tests in TestFairScheduler are missing @Test annotation
> ---
>
> Key: YARN-1060
> URL: https://issues.apache.org/jira/browse/YARN-1060
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Niranjan Singh
>  Labels: newbie
> Fix For: 2.3.0
>
> Attachments: YARN-1060.patch
>
>
> Amazingly, these tests appear to pass with the annotations added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-337) RM handles killed application tracking URL poorly

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739713#comment-13739713
 ] 

Hudson commented on YARN-337:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4257 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4257/])
YARN-337. RM handles killed application tracking URL poorly. Contributed by 
Jason Lowe (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513888)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> RM handles killed application tracking URL poorly
> -
>
> Key: YARN-337
> URL: https://issues.apache.org/jira/browse/YARN-337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>  Labels: usability
> Attachments: YARN-337.patch
>
>
> When the ResourceManager kills an application, it leaves the proxy URL 
> redirecting to the original tracking URL for the application even though the 
> ApplicationMaster is no longer there to service it.  It should redirect it 
> somewhere more useful, like the RM's web page for the application, where the 
> user can find that the application was killed and links to the AM logs.
> In addition, sometimes the AM during teardown from the kill can attempt to 
> unregister and provide an updated tracking URL, but unfortunately the RM has 
> "forgotten" the AM due to the kill and refuses to process the unregistration. 
>  Instead it logs:
> {noformat}
> 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_01
> {noformat}
> It should go ahead and process the unregistration to update the tracking URL 
> since the application offered it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739656#comment-13739656
 ] 

Hudson commented on YARN-1060:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1491 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1491/])
YARN-1060. Two tests in TestFairScheduler are missing @Test annotation 
(Niranjan Singh via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> Two tests in TestFairScheduler are missing @Test annotation
> ---
>
> Key: YARN-1060
> URL: https://issues.apache.org/jira/browse/YARN-1060
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Niranjan Singh
>  Labels: newbie
> Fix For: 2.3.0
>
> Attachments: YARN-1060.patch
>
>
> Amazingly, these tests appear to pass with the annotations added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1023) [YARN-321] Webservices REST API's support for Application History

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739634#comment-13739634
 ] 

Hadoop QA commented on YARN-1023:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597960/YARN-1023-v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1711//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1711//console

This message is automatically generated.

> [YARN-321] Webservices REST API's support for Application History
> -
>
> Key: YARN-1023
> URL: https://issues.apache.org/jira/browse/YARN-1023
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-321
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage

2013-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739621#comment-13739621
 ] 

Hadoop QA commented on YARN-954:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597959/YARN-954-v2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1712//console

This message is automatically generated.

> [YARN-321] History Service should create the webUI and wire it to 
> HistoryStorage
> 
>
> Key: YARN-954
> URL: https://issues.apache.org/jira/browse/YARN-954
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Devaraj K
> Attachments: YARN-954-v0.patch, YARN-954-v1.patch, YARN-954-v2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage

2013-08-14 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-954:
---

Attachment: YARN-954-v2.patch

> [YARN-321] History Service should create the webUI and wire it to 
> HistoryStorage
> 
>
> Key: YARN-954
> URL: https://issues.apache.org/jira/browse/YARN-954
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Devaraj K
> Attachments: YARN-954-v0.patch, YARN-954-v1.patch, YARN-954-v2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1023) [YARN-321] Webservices REST API's support for Application History

2013-08-14 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-1023:


Attachment: YARN-1023-v1.patch

> [YARN-321] Webservices REST API's support for Application History
> -
>
> Key: YARN-1023
> URL: https://issues.apache.org/jira/browse/YARN-1023
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-321
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739540#comment-13739540
 ] 

Hudson commented on YARN-1036:
--

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #699 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/699/])
YARN-1036. Distributed Cache gives inconsistent result if cache files get 
deleted from task tracker. Contributed by Mayank Bansal and Ravi Prakash 
(jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513636)
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: YARN-1036
> URL: https://issues.apache.org/jira/browse/YARN-1036
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.9
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 0.23.10
>
> Attachments: YARN-1036.branch-0.23.patch, 
> YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch
>
>
> This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because 
> that one had been closed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-543) [Umbrella] NodeManager localization related issues

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739541#comment-13739541
 ] 

Hudson commented on YARN-543:
-

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #699 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/699/])
YARN-543. Shared data structures in Public Localizer and Private Localizer are 
not Thread safe. Contributed by Omkar Vinit Joshi and Mit Desai (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513674)
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


> [Umbrella] NodeManager localization related issues
> --
>
> Key: YARN-543
> URL: https://issues.apache.org/jira/browse/YARN-543
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Vinod Kumar Vavilapalli
>
> Seeing a bunch of localization related issues being worked on, this is the 
> tracking ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation

2013-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739521#comment-13739521
 ] 

Hudson commented on YARN-1060:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #301 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/301/])
YARN-1060. Two tests in TestFairScheduler are missing @Test annotation 
(Niranjan Singh via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> Two tests in TestFairScheduler are missing @Test annotation
> ---
>
> Key: YARN-1060
> URL: https://issues.apache.org/jira/browse/YARN-1060
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Niranjan Singh
>  Labels: newbie
> Fix For: 2.3.0
>
> Attachments: YARN-1060.patch
>
>
> Amazingly, these tests appear to pass with the annotations added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-14 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739424#comment-13739424
 ] 

Junping Du commented on YARN-292:
-

bq. I'll document it as the comment in AMContainerAllocatedTransition.
Thanks. 
bq. CapacityScheduler.applications is already ConcurrentHashMap, and all the 
methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think 
we don't need to change it.
That's true. thx!

> ResourceManager throws ArrayIndexOutOfBoundsException while handling 
> CONTAINER_ALLOCATED for application attempt
> 
>
> Key: YARN-292
> URL: https://issues.apache.org/jira/browse/YARN-292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Zhijie Shen
> Attachments: YARN-292.1.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
> Calling allocate on removed or non existant application 
> appattempt_1356385141279_49525_01
> 2012-12-26 08:41:15,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_ALLOCATED for applicationAttempt 
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>   at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-14 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739378#comment-13739378
 ] 

Zhijie Shen commented on YARN-292:
--

Thanks for reviewing the patch, Junping!

bq. However, I would suggest to document why at least one container is expected 
in allocation or adding no empty check on getContainers().

In ScheduleTransition, it is already checked that the number of allocated 
containers is 0, which means newlyAllocatedContainers is still empty. 
Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and 
is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an 
RMContainer is created and put into newlyAllocatedContainers. Therefore, in 
AMContainerAllocatedTransition, at least 1 container is expected. I'll document 
it as the comment in AMContainerAllocatedTransition.

bq. but not address CapacityScheduler (applicationsMap should be in class of 
LeafQueue).

CapacityScheduler.applications is already ConcurrentHashMap, and all the 
methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think 
we don't need to change it.

> ResourceManager throws ArrayIndexOutOfBoundsException while handling 
> CONTAINER_ALLOCATED for application attempt
> 
>
> Key: YARN-292
> URL: https://issues.apache.org/jira/browse/YARN-292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Zhijie Shen
> Attachments: YARN-292.1.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
> Calling allocate on removed or non existant application 
> appattempt_1356385141279_49525_01
> 2012-12-26 08:41:15,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_ALLOCATED for applicationAttempt 
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>   at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart

2013-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739372#comment-13739372
 ] 

Karthik Kambatla commented on YARN-1055:


bq. In case of a network issue where the AM is running but cannot talk to the 
RM or say the NM on which the AM was running goes down, what knob would control 
handling these situations?

For these two cases, I would use the AM-failure knob because the AM is 
"perceived" to have failed. Is there more to the question that I have totally 
missed?

> Handle app recovery differently for AM failures and RM restart
> --
>
> Key: YARN-1055
> URL: https://issues.apache.org/jira/browse/YARN-1055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-14 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739316#comment-13739316
 ] 

Junping Du commented on YARN-292:
-

Also, I see you only address Fifo and Fair, but not address CapacityScheduler 
(applicationsMap should be in class of LeafQueue). Shall we apply the same 
change there?

> ResourceManager throws ArrayIndexOutOfBoundsException while handling 
> CONTAINER_ALLOCATED for application attempt
> 
>
> Key: YARN-292
> URL: https://issues.apache.org/jira/browse/YARN-292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Zhijie Shen
> Attachments: YARN-292.1.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
> Calling allocate on removed or non existant application 
> appattempt_1356385141279_49525_01
> 2012-12-26 08:41:15,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_ALLOCATED for applicationAttempt 
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>   at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

100 matches

Mail list logo