[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740752#comment-13740752 ] Hadoop QA commented on YARN-1044: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598163/yarn-1044.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1722//console This message is automatically generated. > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1044: -- Attachment: yarn-1044.patch > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1044: -- Attachment: (was: yarn-1044.patch) > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740745#comment-13740745 ] Hadoop QA commented on YARN-1044: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598160/yarn-1044.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1721//console This message is automatically generated. > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1044: -- Attachment: yarn-1044.patch > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1044: -- Attachment: (was: yarn-1044.patch) > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740735#comment-13740735 ] Bikas Saha commented on YARN-1004: -- IMO the intent of yarn.scheduler.max is to be an admin setting that restricts how much resource can be given to any one container. Its exposed via public YARN API. All schedulers are supposed to enforce the admin value and not determine it for themselves. Hence I dont think it should be scheduler specific. yarn.scheduler.min is scheduler internal logic on how it simplifies the bin-packing problem. Both current schedulers use it and its not exposed by any YARN API because its of no use to the user. We may split it to be scheduler specific but do we really see either scheduler not using it in the foreseeable future? Perhaps we are causing more grief than good by splitting them. increment-allocation-mb is used only in the fair scheduler. lets just rename that to be scheduler specific. > yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed > with the scheduler type > -- > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1048) Add new AMRMClientAsync.getMatchingRequests method taking a Container as parameter
[ https://issues.apache.org/jira/browse/YARN-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740731#comment-13740731 ] Hitesh Shah commented on YARN-1048: --- [~josephkniest] That would be org.apache.hadoop.yarn.api.records.Container. General information on the api package - it will be restricted to classes within the api layer and nothing from other server-side/impl packages. > Add new AMRMClientAsync.getMatchingRequests method taking a Container as > parameter > -- > > Key: YARN-1048 > URL: https://issues.apache.org/jira/browse/YARN-1048 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur > > The current method signature {{getMatchingRequests(Priority priority, String > resourceName, Resource resource)}} for using within > {{onContainersAllocated(List containers)}} as we have to > deconstruct the info from the received containers. > A new signature, {{getMatchingRequests(Container container)}} would simplify > usage for clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1048) Add new AMRMClientAsync.getMatchingRequests method taking a Container as parameter
[ https://issues.apache.org/jira/browse/YARN-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740729#comment-13740729 ] Joseph Kniest commented on YARN-1048: - Hi, new to hadoop. In beginning to solve this I'm trying to look up the Container class but there are a couple that have 'yarn' in their namespace/classpath. There's also an interface called that. Can you provide me with the full classpath of the Container class/interface we want to be the parameter in the aforementioned function? > Add new AMRMClientAsync.getMatchingRequests method taking a Container as > parameter > -- > > Key: YARN-1048 > URL: https://issues.apache.org/jira/browse/YARN-1048 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur > > The current method signature {{getMatchingRequests(Priority priority, String > resourceName, Resource resource)}} for using within > {{onContainersAllocated(List containers)}} as we have to > deconstruct the info from the received containers. > A new signature, {{getMatchingRequests(Container container)}} would simplify > usage for clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740694#comment-13740694 ] Hitesh Shah commented on YARN-1006: --- [~vinodkv] [~xgong] Is this a blocker for 2.1.0? > Nodes list web page on the RM web UI is broken > -- > > Key: YARN-1006 > URL: https://issues.apache.org/jira/browse/YARN-1006 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > Attachments: YARN-1006.1.patch > > > The nodes web page which list all the connected nodes of the cluster is > broken. > 1. The page is not showing in correct format/style. > 2. If we restart the NM, the node list is not refreshed, but just add the new > started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740691#comment-13740691 ] Hadoop QA commented on YARN-1006: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598150/YARN-1006.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1720//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1720//console This message is automatically generated. > Nodes list web page on the RM web UI is broken > -- > > Key: YARN-1006 > URL: https://issues.apache.org/jira/browse/YARN-1006 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > Attachments: YARN-1006.1.patch > > > The nodes web page which list all the connected nodes of the cluster is > broken. > 1. The page is not showing in correct format/style. > 2. If we restart the NM, the node list is not refreshed, but just add the new > started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740673#comment-13740673 ] Xuan Gong commented on YARN-1006: - Trivial patch. No tests added > Nodes list web page on the RM web UI is broken > -- > > Key: YARN-1006 > URL: https://issues.apache.org/jira/browse/YARN-1006 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > Attachments: YARN-1006.1.patch > > > The nodes web page which list all the connected nodes of the cluster is > broken. > 1. The page is not showing in correct format/style. > 2. If we restart the NM, the node list is not refreshed, but just add the new > started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1006: Attachment: YARN-1006.1.patch The reason why the page is not showing in correct format/style is because at YARN-686, we flattened the nodeReport, deleted Health-status from the nodeReport, but we did not update the column index at the nodesTableInit function. After update the column index, we should fix the issue. > Nodes list web page on the RM web UI is broken > -- > > Key: YARN-1006 > URL: https://issues.apache.org/jira/browse/YARN-1006 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > Attachments: YARN-1006.1.patch > > > The nodes web page which list all the connected nodes of the cluster is > broken. > 1. The page is not showing in correct format/style. > 2. If we restart the NM, the node list is not refreshed, but just add the new > started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1004: - Fix Version/s: 2.1.0-beta setting the fix version to 2.1.0-beta so we don't missed before cutting the RC > yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed > with the scheduler type > -- > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740657#comment-13740657 ] Alejandro Abdelnur commented on YARN-1064: -- The ones that caught my eye are: {code} YARN_PREFIX + "scheduler.minimum-allocation-mb"; YARN_PREFIX + "scheduler.minimum-allocation-vcores"; YARN_PREFIX + "scheduler.maximum-allocation-mb"; YARN_PREFIX + "scheduler.maximum-allocation-vcores"; RM_PREFIX + "scheduler.client.thread-count"; RM_PREFIX + "scheduler.monitor.enable"; RM_PREFIX + "scheduler.monitor.policies"; {code} YARN-1004 would take care of the first 2. What about the last 3, are they a false positive from my side and it is OK they say in the RM? If so, we can close this as invalid. > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Labels: newbie > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740650#comment-13740650 ] Alejandro Abdelnur commented on YARN-1055: -- [~bikassaha], [~vinodkv], in Hadoop 1 because the RM and MRAM logic are done by a single component, the JT, there is not need for this additional setting. Because in hadoop 2 the failure can be of the AM or the RM, we need to be able to detect. This is a regression of functionality that should be addressed. I would use the same argument being used in MAPREDUCE-5311 in favor of keeping around functionality from Hadoop 1, that users rely on in Hadoop 2. Eventually Oozie and component clients will evolve to fully leverage Yarn capabilities, but it will take a while, we have to give a hand and provide stop gaps. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740624#comment-13740624 ] Hudson commented on YARN-1056: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4263 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4263/]) YARN-1056. Remove dual use of string 'resourcemanager' in yarn.resourcemanager.connect.{max.wait.secs|retry_interval.secs}. Contributed by Karthik Kambatla. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514135) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Trivial > Labels: conf > Fix For: 2.1.0-beta > > Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1056) Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
[ https://issues.apache.org/jira/browse/YARN-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1056: Priority: Trivial (was: Blocker) > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > > > Key: YARN-1056 > URL: https://issues.apache.org/jira/browse/YARN-1056 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Trivial > Labels: conf > Attachments: yarn-1056-1.patch, yarn-1056-1.patch, yarn-1056-2.patch > > > Fix configs > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} > to have a *resourcemanager* only once, make them consistent with other such > yarn configs and add entries in yarn-default.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740601#comment-13740601 ] Vinod Kumar Vavilapalli commented on YARN-1064: --- Can you list the specific changes that you are proposing? Asking as some of the configs that are common to all schedulers are termed RM configs, so.. BTW, I did think of fixing all config names, but felt it was too late. If possible, we should avoid it. If only we pay more attention with reviews, we wouldn't be needing these major configuration name surgeries. I'm leaning towards keeping the names as they are, instead of changing them now and creating lots of confusion. And request everyone to +1 patches with config names with more care - we should definitely have a config name guide. > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Labels: newbie > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740597#comment-13740597 ] Vinod Kumar Vavilapalli commented on YARN-1004: --- Like I mentioned, Fifo and CS both already depend on it and have the same meaning. Only FairScheduler diverges in meaning. Adding a new tag in the name is akin to adding more description. Let's not break the minimum config now, particularly given MAPREDUCE-5311 's dependency on a min-allocation config. > yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed > with the scheduler type > -- > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Priority: Blocker > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740576#comment-13740576 ] Vinod Kumar Vavilapalli commented on YARN-1055: --- Same here :) We do really understand the underlying issue, was just trying to converge on the correct solution. To summarize - For the restart-case, work preserving case solves the problem of not killing AM unnecessarily - For node failures or AM crashing, we already have a knob. To avoid split brain issues, oozie/Pig/Hive all need to implement restartability for their launchers. Given the later isn't coming in a rush, we should make oozie set max-attempts to 1 for the launcher. Regarding the question of dependent AMs, dependent or not, YARN will only restart AM by AM. If the apps care, they need to implement recoverability. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740500#comment-13740500 ] Bikas Saha commented on YARN-1055: -- Thats exactly what I was trying to say earlier. That RM restart is not creating a new problem here and we dont need a restart specific config. Restart specific case will be solved when we put in work-preserving restart shortly. The generic problem still exists and needs to be fixed in Oozie because only Oozie knows which parts of the pipeline can be restarted and which cannot. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740495#comment-13740495 ] Karthik Kambatla commented on YARN-1055: Thinking more about it, the issue is not limited to RM failure. This happens even in the case where a node running the launcher goes down. The underlying issue seems to be in handling the dependency between AMs and wanting to tolerate failures of some of these AMs and not others. Given that adding the config won't solve the issue completely, I agree that it is not a good idea to fix it for RM restart alone. Thanks Bikas, Vinod, Hitesh, Alejandro for the detailed discussion. The issue, however, exists with dependent AMs and need to be handled - may be in Ooize for now? In the long term, would it make any sense for YARN to support inter-dependent AMs? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740474#comment-13740474 ] Bikas Saha commented on YARN-1055: -- Why does the launcher not retry the action? Is there a jira in OOZIE to make it work properly in such cases by doing its own book-keeping? Isnt it more correct to fix OOZIE instead of adding a workaround config in YARN? Is the current situation acceptable as a known short term bug? From what I see nothing wrong will happen functionally/practically. In infrequent cases of the action-AM node crashing, the pipeline would have to be restarted. We have a design for work-preserving RM restart that can be completed post beta. This will remove the need to restart AM's. Given that, I am really averse to adding a short term work around API in AppSubmissionContext that will have to be maintained till YARN-3.0 comes out because we are guaranteeing API's post beta. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740446#comment-13740446 ] Robert Kanter commented on YARN-1055: - Another way of phrasing this: when the action's AM dies, we want to recover it (and the launcher can still monitor it with JobClient as-is), but if the action and launcher AMs both die due to an RM restart, we don't want to recover the action's AM. Hence in the first case, we'd want the max-am-retries set to >1 and in the second case we'd want it set to =1. But it can't be both. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740436#comment-13740436 ] Karthik Kambatla commented on YARN-1055: This problem doesn't exist in Hadoop-1 because JobTracker plays the role of RM and AM. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740437#comment-13740437 ] Sangjin Lee commented on YARN-1044: --- By the way, the xml issue with the capacity scheduler should be fixed, but it's a somewhat separate problem that would call for a different solution (jaxb-specific). I think it should be a separate ticket. > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740439#comment-13740439 ] Hadoop QA commented on YARN-1044: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598094/yarn-1044.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1719//console This message is automatically generated. > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740435#comment-13740435 ] Karthik Kambatla commented on YARN-1055: In Hadoop 1, we set the job.recovery.enable setting to true for the launcher job and false for the action job. When JT restarts, the launcher alone is recovered. The recovered launcher then starts the action exactly the same way as before. In Hadoop 2, that translates to setting the max-am-retries to > 1 for the launcher job and = 1 for the action job. When RM restarts, the launcher alone is recovered, which restarts the action. However, if the action-AM alone dies (due to the node running it crashing etc.) and the launcher-AM doesn't, the launcher does not retry the action. IOW, the failure is ignored. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1044: -- Attachment: yarn-1044.patch Proposed patch for escaping invalid characters for html. > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png, yarn-1044.patch > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740367#comment-13740367 ] Bikas Saha commented on YARN-1055: -- How does it work in hadoop 1 then? From what I see the externally visible behavior of JT and RM is identical in both cases. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740362#comment-13740362 ] Karthik Kambatla commented on YARN-1055: As in my comment from above (https://issues.apache.org/jira/browse/YARN-1055?focusedCommentId=13737487&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13737487), max-am-retries is not enough for Oozie without significant changes to how Oozie launcher is implemented. To work around this, Oozie launcher will have to monitor the action and re-submit the action in case the action AM fails > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1065: -- Target Version/s: 2.1.1-beta > NM should provide AuxillaryService data to the container > > > Key: YARN-1065 > URL: https://issues.apache.org/jira/browse/YARN-1065 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 2.0.4-alpha >Reporter: Bikas Saha > > Start container returns auxillary service data to the AM but does not provide > the same information to the task itself. It could add that information to the > container env with key=service_name and value=service_data. This allows the > container to start using the service without having to depend on the AM to > send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1065: -- Affects Version/s: (was: 2.0.4-alpha) > NM should provide AuxillaryService data to the container > > > Key: YARN-1065 > URL: https://issues.apache.org/jira/browse/YARN-1065 > Project: Hadoop YARN > Issue Type: Task >Reporter: Bikas Saha > > Start container returns auxillary service data to the AM but does not provide > the same information to the task itself. It could add that information to the > container env with key=service_name and value=service_data. This allows the > container to start using the service without having to depend on the AM to > send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1065) NM should provide AuxillaryService data to the container
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1065: -- Affects Version/s: (was: 2.1.1-beta) 2.0.4-alpha > NM should provide AuxillaryService data to the container > > > Key: YARN-1065 > URL: https://issues.apache.org/jira/browse/YARN-1065 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 2.0.4-alpha >Reporter: Bikas Saha > > Start container returns auxillary service data to the AM but does not provide > the same information to the task itself. It could add that information to the > container env with key=service_name and value=service_data. This allows the > container to start using the service without having to depend on the AM to > send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1065) NM should provide AuxillaryService data to the container
Bikas Saha created YARN-1065: Summary: NM should provide AuxillaryService data to the container Key: YARN-1065 URL: https://issues.apache.org/jira/browse/YARN-1065 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.1.1-beta Reporter: Bikas Saha Start container returns auxillary service data to the AM but does not provide the same information to the task itself. It could add that information to the container env with key=service_name and value=service_data. This allows the container to start using the service without having to depend on the AM to send the info to it indirectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1049) ContainerExistStatus and ContainerState are defined incorrectly
[ https://issues.apache.org/jira/browse/YARN-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740265#comment-13740265 ] Zhijie Shen commented on YARN-1049: --- bq. ContainerExitStatus defines a few constant with special exit status values (0,-1000, -100, -101). This is incorrect, we should not define any special constants and limit to return the actual process exist status code. In addition to ContainerExitStatus, ExitCode also defines 137 and 143. However, except these values, a container's exit code usually comes from the exit value of its process. Are you concerned that the value from the process may conflict the self defined one? > ContainerExistStatus and ContainerState are defined incorrectly > --- > > Key: YARN-1049 > URL: https://issues.apache.org/jira/browse/YARN-1049 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.1.0-beta > > > ContainerExitStatus defines a few constant with special exit status values > (0,-1000, -100, -101). This is incorrect, we should not define any special > constants and limit to return the actual process exist status code. > ContainerState should include PREEMPTED (when preempted by YARN), LOST (when > the NM crashes). > With the current behavior is impossible to determine if a container has been > preempted or lost due to a NM crash. > Marking it as a blocker for 2.1.0 as this is an API/behavior change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-305) Too many 'Node offerred to app:..." messages in RM
[ https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740264#comment-13740264 ] Hadoop QA commented on YARN-305: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598064/YARN-305.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1718//console This message is automatically generated. > Too many 'Node offerred to app:..." messages in RM > -- > > Key: YARN-305 > URL: https://issues.apache.org/jira/browse/YARN-305 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Lohit Vijayarenu >Priority: Minor > Attachments: YARN-305.1.patch > > > Running fair scheduler YARN shows that RM has lots of messages like the below. > {noformat} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Node offered to app: application_1357147147433_0002 reserved: false > {noformat} > They dont seem to tell much and same line is dumped many times in RM log. It > would be good to have it improved with node information or moved to some > other logging level with enough debug information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740249#comment-13740249 ] Alejandro Abdelnur commented on YARN-1055: -- [~rkanter], [~kkambatl], can you please see if max-am-retries is enough for what Oozie needs? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740242#comment-13740242 ] Bikas Saha commented on YARN-1055: -- First of all, whatever needs to be set must be set in the AppSubmissionContext API for that job. Only that is job specific and this config cannot be global across all jobs. By MAPREDUCE-4824 on job submission, we set a property in job conf (that is job specific) saying not to retry the job. In YARN, on job submission, in the AppSubmissionContext API (that is job specific), we say that max-am-retries = 1. For a job that cannot be restarted, (either due to AM crash or node crash or RM restart AND all these are indistinguishable wrt to the job) the per job max-am-retries needs to be set to 1. Its probably 2 weeks worth of work to remove RM restart from the above list. Even after that, such a job needs to set max-am-retries = 1 so that RM does not restart the job when the node crashes or AM crashes. Why does an rm restart related special API need to be added now? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740239#comment-13740239 ] Zhijie Shen commented on YARN-1064: --- It sounds good to uniform the prefix. Better to use "yarn.resourcemanager.scheduler"? Shall we consider the compatibility to the early 2.x versions? Maybe we can deprecate, but not remove the ones beginning with YARN_PREFIX. > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Labels: newbie > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740228#comment-13740228 ] Hadoop QA commented on YARN-1008: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598055/YARN-1008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1717//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1717//console This message is automatically generated. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch, YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-305) Too many 'Node offerred to app:..." messages in RM
[ https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lohit Vijayarenu updated YARN-305: -- Attachment: YARN-305.1.patch Had generated diff from old branch. Reattaching diff. > Too many 'Node offerred to app:..." messages in RM > -- > > Key: YARN-305 > URL: https://issues.apache.org/jira/browse/YARN-305 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Lohit Vijayarenu >Priority: Minor > Attachments: YARN-305.1.patch > > > Running fair scheduler YARN shows that RM has lots of messages like the below. > {noformat} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Node offered to app: application_1357147147433_0002 reserved: false > {noformat} > They dont seem to tell much and same line is dumped many times in RM log. It > would be good to have it improved with node information or moved to some > other logging level with enough debug information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-305) Too many 'Node offerred to app:..." messages in RM
[ https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lohit Vijayarenu updated YARN-305: -- Attachment: (was: YARN-305.1.patch) > Too many 'Node offerred to app:..." messages in RM > -- > > Key: YARN-305 > URL: https://issues.apache.org/jira/browse/YARN-305 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Lohit Vijayarenu >Priority: Minor > > Running fair scheduler YARN shows that RM has lots of messages like the below. > {noformat} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Node offered to app: application_1357147147433_0002 reserved: false > {noformat} > They dont seem to tell much and same line is dumped many times in RM log. It > would be good to have it improved with node information or moved to some > other logging level with enough debug information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740216#comment-13740216 ] Sandy Ryza commented on YARN-1064: -- I think this is only for the scheduler configs. Do you think there is a fundamental difference between the ones that start with "yarn.resourcemanager.scheduler" and the ones that start with "yarn.scheduler".? > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Labels: newbie > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740211#comment-13740211 ] Omkar Vinit Joshi commented on YARN-1008: - +1.. lgtm > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch, YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740207#comment-13740207 ] Omkar Vinit Joshi commented on YARN-1064: - like..RM_PREFIX..clearly means it is for RM... similarly NM_PREFIX for NM and YARN_PREFIX for other general stuff.. if we use common prefix then first of all there will be no point to have any prefix as all yarn specific configurations will go into yarn-site.xml and it is meant for YarnConfiguration only Let me know if you disagree.. > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Labels: newbie > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1064: Labels: newbie (was: ) > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Labels: newbie > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740204#comment-13740204 ] Alejandro Abdelnur commented on YARN-1055: -- [~bikassaha], bq. Restart on am failure is already determined by the default value of max am retries in yarn config. Setting that to 1 will prevent RM from restarting AM's on failure. Thus no need for new config. Restart after RM restart is already covered by setting max am retries to 1 by the app client on app submission. Are we talking about the same property here? if so I don't see how you can differentiate between AM failure and RM restart. bq. If an app cannot handle this situation it should create its own config and set the correct value of 1 on submission. YARN should not add a config IMO. If I remember right, this config is being imported from hadoop 1 and the impl of this config in hadoop 1 is what RM already does to handle user defined max am retries. Oozie is using MRAM for the launcher job, so it is not a question of the AM not handling it. The problem is that to Oozie the MR jobs started by hive/distcp/sqoop are opaque until the jobs complete (it is a limitation of the clients of these components). With MAPREDUCE-4824, in Hadoop 1 we have specify the number of retries for a task (that would be equivalent to specifying the number of AM retries) and we can specify if a job is recoverable or not. We need the equivalent of MAPREDUCE-4824 in Hadoop 2. Unless I'm missing something, this is not available. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-305) Too many 'Node offerred to app:..." messages in RM
[ https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740201#comment-13740201 ] Hadoop QA commented on YARN-305: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598058/YARN-305.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1716//console This message is automatically generated. > Too many 'Node offerred to app:..." messages in RM > -- > > Key: YARN-305 > URL: https://issues.apache.org/jira/browse/YARN-305 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Lohit Vijayarenu >Priority: Minor > Attachments: YARN-305.1.patch > > > Running fair scheduler YARN shows that RM has lots of messages like the below. > {noformat} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Node offered to app: application_1357147147433_0002 reserved: false > {noformat} > They dont seem to tell much and same line is dumped many times in RM log. It > would be good to have it improved with node information or moved to some > other logging level with enough debug information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740198#comment-13740198 ] Sandy Ryza commented on YARN-1008: -- +1 > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch, YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-305) Too many 'Node offerred to app:..." messages in RM
[ https://issues.apache.org/jira/browse/YARN-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lohit Vijayarenu updated YARN-305: -- Attachment: YARN-305.1.patch Simple patch to change log level to debug and add node information. I also saw similar case while offering node to queue, so add node information these as well. Could not think of test case as this is only changing loglevel > Too many 'Node offerred to app:..." messages in RM > -- > > Key: YARN-305 > URL: https://issues.apache.org/jira/browse/YARN-305 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Lohit Vijayarenu >Priority: Minor > Attachments: YARN-305.1.patch > > > Running fair scheduler YARN shows that RM has lots of messages like the below. > {noformat} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: > Node offered to app: application_1357147147433_0002 reserved: false > {noformat} > They dont seem to tell much and same line is dumped many times in RM log. It > would be good to have it improved with node information or moved to some > other logging level with enough debug information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1008: - Attachment: YARN-1008.patch missed the dot before node, fixed. also added javadocs to the MiniYARNCluster class indicating the use of this property and how it works. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch, YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740178#comment-13740178 ] Alejandro Abdelnur commented on YARN-1064: -- [~ojoshi], can you please explain why is more intuitive not to be consistent? > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740167#comment-13740167 ] Bikas Saha commented on YARN-1063: -- Can you please provide some overall design approach. Pros cons etc. > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: trunk-win > Environment: Windows >Reporter: Kyle Leckie > Labels: security > Fix For: trunk-win > > Attachments: YARN-732.patch > > > Task isolation requires the ability to launch tasks in the context of a > particular domain user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740162#comment-13740162 ] Karthik Kambatla commented on YARN-1055: [~hitesh], you are right - we should be careful in labeling failures one way or the other. We should probably classify the failures from a user-perspective and then look into what configs are required. At the least, I see the following different classes: # Non-AM container/task failures # AM container failures # Bunch of (related) AMs failing due to node failures - nodes crashing or network partitions or RM failure. Thoughts? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740161#comment-13740161 ] Omkar Vinit Joshi commented on YARN-1064: - I think this is more intuitive the way it isthan using same prefix for all..thoughts? > YarnConfiguration scheduler configuration constants are not consistent > -- > > Key: YARN-1064 > URL: https://issues.apache.org/jira/browse/YARN-1064 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Priority: Blocker > Fix For: 2.1.0-beta > > > Some of the scheduler configuration constants in YarnConfiguration have > RM_PREFIX and others YARN_PREFIX. For consistency we should move all under > the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740112#comment-13740112 ] Hadoop QA commented on YARN-1008: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598040/YARN-1008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1715//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1715//console This message is automatically generated. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740101#comment-13740101 ] Sandy Ryza commented on YARN-1008: -- Is there a reason for using "include-port-in-node.name" and not "include-port-in-node-name"? Also, would it make sense to turn it on by default in MiniYARNCluster? Or put some doc there to let people know about its existence? Otherwise, LGTM. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1008: - Attachment: YARN-1008.patch Addressed all comments. Created YARN-1064 as there are some scheduler config constants that have different prefixes. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch, > YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
Alejandro Abdelnur created YARN-1064: Summary: YarnConfiguration scheduler configuration constants are not consistent Key: YARN-1064 URL: https://issues.apache.org/jira/browse/YARN-1064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.0-beta Some of the scheduler configuration constants in YarnConfiguration have RM_PREFIX and others YARN_PREFIX. For consistency we should move all under the same prefix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1004) yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type
[ https://issues.apache.org/jira/browse/YARN-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1004: - Summary: yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed with the scheduler type (was: yarn.scheduler.minimum|maximum|increment-allocation-mb should have scheduler) > yarn.scheduler.minimum|maximum|increment-allocation-mb should be prefixed > with the scheduler type > -- > > Key: YARN-1004 > URL: https://issues.apache.org/jira/browse/YARN-1004 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Priority: Blocker > Attachments: YARN-1004.patch > > > As yarn.scheduler.minimum-allocation-mb is now a scheduler-specific > configuration, and functions differently for the Fair and Capacity > schedulers, it would be less confusing for the config names to include the > scheduler names, i.e. yarn.scheduler.fair.minimum-allocation-mb, > yarn.scheduler.capacity.minimum-allocation-mb, and > yarn.scheduler.fifo.minimum-allocation-mb. > The same goes for yarn.scheduler.increment-allocation-mb, which only exists > for the Fair Scheduler, and yarn.scheduler.maximum-allocation-mb, for > consistency. > If we wish to preserve backwards compatibility, we can deprecate the old > configs to the new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Leckie updated YARN-1063: -- Attachment: (was: YARN-732.patch) > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: trunk-win > Environment: Windows >Reporter: Kyle Leckie > Labels: security > Fix For: trunk-win > > Attachments: YARN-732.patch > > > Task isolation requires the ability to launch tasks in the context of a > particular domain user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Leckie updated YARN-1063: -- Attachment: YARN-732.patch code patch > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: trunk-win > Environment: Windows >Reporter: Kyle Leckie > Labels: security > Fix For: trunk-win > > Attachments: YARN-732.patch > > > Task isolation requires the ability to launch tasks in the context of a > particular domain user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740060#comment-13740060 ] Hadoop QA commented on YARN-867: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598022/YARN-867.1.sampleCode.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1714//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1714//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1714//console This message is automatically generated. > Isolation of failures in aux services > -- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-867.1.sampleCode.patch > > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Leckie updated YARN-1063: -- Attachment: YARN-732.patch Code patch > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: trunk-win > Environment: Windows >Reporter: Kyle Leckie > Labels: security > Fix For: trunk-win > > Attachments: YARN-732.patch > > > Task isolation requires the ability to launch tasks in the context of a > particular domain user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
[ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740048#comment-13740048 ] Omkar Vinit Joshi commented on YARN-1061: - How can NM wait infinitely? I mean what is your connection timeout set to? can you add below parameters to your log4j.properties and see if actually times out or wait infinitely for RM... Also can attach those logs once you simulate it? {code} log4j.logger.org.apache.hadoop.ipc.Server=DEBUG log4j.logger.org.apache.hadoop.ipc.Client=DEBUG {code} Also helpful configurations from *CommonConfigurationKeysPublic* {code} public static final String IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY = "ipc.client.connection.maxidletime"; /** Default value for IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY */ public static final int IPC_CLIENT_CONNECTION_MAXIDLETIME_DEFAULT = 1; // 10s /** See core-default.xml */ public static final String IPC_CLIENT_CONNECT_TIMEOUT_KEY = "ipc.client.connect.timeout"; /** Default value for IPC_CLIENT_CONNECT_TIMEOUT_KEY */ public static final int IPC_CLIENT_CONNECT_TIMEOUT_DEFAULT = 2; // 20s {code} > NodeManager is indefinitely waiting for nodeHeartBeat() response from > ResouceManager. > - > > Key: YARN-1061 > URL: https://issues.apache.org/jira/browse/YARN-1061 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha >Reporter: Rohith Sharma K S > > It is observed that in one of the scenario, NodeManger is indefinetly waiting > for nodeHeartbeat response from ResouceManger where ResouceManger is in > hanged up state. > NodeManager should get timeout exception instead of waiting indefinetly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740042#comment-13740042 ] Hitesh Shah commented on YARN-867: -- Might be good to break this down in a subset of jiras. The first ( this jira itself ) to just ensure that the NM does not crash. The second to address the proposed changes in the protocol and potential changes in the MR AM to use the new apis and handle failures as needed. > Isolation of failures in aux services > -- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-867.1.sampleCode.patch > > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1006) Nodes list web page on the RM web UI is broken
[ https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1006: --- Assignee: Xuan Gong (was: Jian He) > Nodes list web page on the RM web UI is broken > -- > > Key: YARN-1006 > URL: https://issues.apache.org/jira/browse/YARN-1006 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Xuan Gong > > The nodes web page which list all the connected nodes of the cluster is > broken. > 1. The page is not showing in correct format/style. > 2. If we restart the NM, the node list is not refreshed, but just add the new > started NM to the list. The old NMs information still remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-867: --- Attachment: YARN-867.1.sampleCode.patch > Isolation of failures in aux services > -- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-867.1.sampleCode.patch > > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740009#comment-13740009 ] Xuan Gong commented on YARN-867: My proposal: When there is any auxService failure, instead of simply throwing out the exceptions to the dispatcher, we will catch them and inform the AM. Here is how it works: We will use containerManagementProtocol. Basically, AM will need to send the AuxiliaryServiceCheckRequest with ApplicationId as parameter frequently (We can set the period as 3s or 5s), and we use ContainerManagementProtocol to send this request to all the ContainerManager that this AM knows. Then those ContainerManagers will send the response back with the information whether there is any AuxiliaryService with this appId is failed, and related diagnositics. At ContainerManagerImpl side, for all the registered AuxServices, if any of them fails, instead of simp lying throwing out of the exceptions to the dispatcher, we will catch the exceptions, and save them with appId and exception message into a AuxServiceFailureMap. In that case, when one containerManager receives AuxiliaryServiceCheckRequest, it can check AuxServiceFailureMap with the appId, and send back the response with whether this is any AuxServices with this appid fails. Attached a sample code for this proposal. > Isolation of failures in aux services > -- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Xuan Gong >Priority: Critical > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1059) '\n' or ' ' or '\t' should be ignored for some configuration parameters
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740005#comment-13740005 ] Zhijie Shen commented on YARN-1059: --- it's duplicate with HADOOP-9869 > '\n' or ' ' or '\t' should be ignored for some configuration parameters > --- > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller >Priority: Minor > Labels: newbie > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1059) '\n' or ' ' or '\t' should be ignored for some configuration parameters
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1059: Summary: '\n' or ' ' or '\t' should be ignored for some configuration parameters (was: IllegalArgumentException while starting YARN) > '\n' or ' ' or '\t' should be ignored for some configuration parameters > --- > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller >Priority: Minor > Labels: newbie > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1059) IllegalArgumentException while starting YARN
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1373#comment-1373 ] Omkar Vinit Joshi commented on YARN-1059: - Modifying title > IllegalArgumentException while starting YARN > > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller >Priority: Minor > Labels: newbie > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1059) IllegalArgumentException while starting YARN
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739997#comment-13739997 ] Omkar Vinit Joshi commented on YARN-1059: - Today none of the configuration parameters read ignores '\n' or ' ' or '\t'. This is not very critical downgrading its priority. > IllegalArgumentException while starting YARN > > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller >Priority: Minor > Labels: newbie > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1059) IllegalArgumentException while starting YARN
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1059: Labels: newbie (was: ) > IllegalArgumentException while starting YARN > > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller > Labels: newbie > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1059) IllegalArgumentException while starting YARN
[ https://issues.apache.org/jira/browse/YARN-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1059: Priority: Minor (was: Major) > IllegalArgumentException while starting YARN > > > Key: YARN-1059 > URL: https://issues.apache.org/jira/browse/YARN-1059 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha > Environment: Ubuntu 12.04, hadoop 2.0.5 >Reporter: rvller >Priority: Minor > Labels: newbie > > Here is the traceback while starting the yarn resourse manager: > 2013-08-12 12:53:29,319 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > java.lang.IllegalArgumentException: Does not contain a valid host:port > authority: > 10.245.1.30:9030 > (configuration property 'yarn.resourcemanager.resource-tracker.address') > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:193) > at > org.apache.hadoop.conf.Configuration.getSocketAddr(Configuration.java:1450) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.init(ResourceTrackerService.java:105) > at > org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.init(ResourceManager.java:255) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:710) > And here is the yarn-site.xml: > > > > yarn.resourcemanager.address > > > 10.245.1.30:9010 > > > > > > > yarn.resourcemanager.scheduler.address > > > 10.245.1.30:9020 > > > > > > > yarn.resourcemanager.resource-tracker.address > > > 10.245.1.30:9030 > > > > > > > yarn.resourcemanager.admin.address > > > 10.245.1.30:9040 > > > > > > > yarn.resourcemanager.webapp.address > > > 10.245.1.30:9050 > > > > > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1063) Winutils needs ability to create task as domain user
Kyle Leckie created YARN-1063: - Summary: Winutils needs ability to create task as domain user Key: YARN-1063 URL: https://issues.apache.org/jira/browse/YARN-1063 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: trunk-win Environment: Windows Reporter: Kyle Leckie Fix For: trunk-win Task isolation requires the ability to launch tasks in the context of a particular domain user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739915#comment-13739915 ] Hadoop QA commented on YARN-292: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console This message is automatically generated. > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch, YARN-292.2.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1008) MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
[ https://issues.apache.org/jira/browse/YARN-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739913#comment-13739913 ] Sandy Ryza commented on YARN-1008: -- A few comments: Can we call the config RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME instead of RM_SCHEDULER_USE_PORT_FOR_NODE_NAME? The latter makes it seem like we're only using the port. Also, like in yarn.scheduler.minimum-allocation-mb, can we use dashes, not periods, for the part that comes after scheduler? Also, it should start with yarn.scheduler, not yarn.resourcemanager.scheduler. In the getNodeName doc, "diferentiate" should be "differentiate". The whole test added to TestFairScheduler needs another space of indentation. The finally block at the end of the test shouldn't be necessary, because we reinitialize with a fresh config before every test already. > MiniYARNCluster with multiple nodemanagers, all nodes have same key for > allocations > --- > > Key: YARN-1008 > URL: https://issues.apache.org/jira/browse/YARN-1008 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: YARN-1008.patch, YARN-1008.patch, YARN-1008.patch > > > While the NMs are keyed using the NodeId, the allocation is done based on the > hostname. > This makes the different nodes indistinguishable to the scheduler. > There should be an option to enabled the host:port instead just port for > allocations. The nodes reported to the AM should report the 'key' (host or > host:port). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739900#comment-13739900 ] Sandy Ryza commented on YARN-1024: -- bq. I would also like us to have a flag that would either limit the container to the requested CPU and let it have no more even when more is available, or would let it expand to use whatever CPU was free, but would be guaranteed to get at least the YCUs requested. YARN-810 should handle this. The plan is to make it a cluster config, but feel free to chime in there if you think it needs to be an app config. bq. 1 YCU is very complex to measure for an application. Agreed that YCUs are very complex to measure and set for applications, and I don't think there is any good way around this. YARN-810 will help considerably, but still won't make it close to as easy as configuring memory. bq. although I think I would change the numbers to be total YCUs requested and minimum YCUs per core. Because of the complexity discussed above in dealing with YCUs, I strongly believe that we should keep one of the parameters as just "number of cores", which allows a user to separate the concerns of "how much parallelism can my task take advantage of?" and "how CPU-bound is my task?". This will also give us something in common with every other cluster resource manager I have surveyed (Condor, Maui, and Torque, etc.) > Define a virtual core unambigiously > --- > > Key: YARN-1024 > URL: https://issues.apache.org/jira/browse/YARN-1024 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > We need to clearly define the meaning of a virtual core unambiguously so that > it's easy to migrate applications between clusters. > For e.g. here is Amazon EC2 definition of ECU: > http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it > Essentially we need to clearly define a YARN Virtual Core (YVC). > Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the > equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-292: - Attachment: YARN-292.2.patch Updated the patch to add comments and assert in AMContainerAllocatedTransition, to justify the number of allocated containers is not zero. > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch, YARN-292.2.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1024) Define a virtual core unambigiously
[ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739853#comment-13739853 ] Robert Joseph Evans commented on YARN-1024: --- {quote}Sorry for the longwindedness.{quote} >From what people have told me you still have a long ways to go before you >approach me for longwindedness :). My initial gut reaction is that only having two numbers to express the request seems too simplified, but the more I think about it the more I am OK with it, although I think I would change the numbers to be total YCUs requested and minimum YCUs per core. This gives the user better viability into how the scheduler is treating these numbers so they can better reason about them. The total YCUs is the value used for scheduling. The minimum YCUs per core is compared to the maxComputeUnitsPerCore like was suggested to reject a request as not possible, or in the case of a heterogeneous environment restrict the hosts that this container can run on. Although I am OK with the original proposal too. I would also like us to have a flag that would either limit the container to the requested CPU and let it have no more even when more is available, or would let it expand to use whatever CPU was free, but would be guaranteed to get at least the YCUs requested. This is likely something that would have to be done on a separate JIRA though. Without this I don't see a way to really get simplicity, predictability, or consistency. 1 MB of RAM is fairly simple to understand. It can be measured without too much of a problem just by running the process. Most user do a simple search for the correct value run with the default, if it does not work I increase the amount and run again. 1 YCU is very complex to measure for an application. If I cannot restrict a container to never use more than what was requested I cannot consistently predict how long it will take to run later. Without this I don't know how to answer the question I know will come up. What should I set these values to? > Define a virtual core unambigiously > --- > > Key: YARN-1024 > URL: https://issues.apache.org/jira/browse/YARN-1024 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > We need to clearly define the meaning of a virtual core unambiguously so that > it's easy to migrate applications between clusters. > For e.g. here is Amazon EC2 definition of ECU: > http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it > Essentially we need to clearly define a YARN Virtual Core (YVC). > Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the > equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739829#comment-13739829 ] Hitesh Shah commented on YARN-1055: --- [~kkambatl] Based on the discussion, I was trying to understand what is conceived as AM failure vs RM restart vs infra- failures. It seems a bit confusing from an app developer point of view that the AM being restarted as a result of the RM restarting is considered different from the AM being restarted because the NM went down. > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1044) used/min/max resources do not display info in the scheduler page
[ https://issues.apache.org/jira/browse/YARN-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739802#comment-13739802 ] Sangjin Lee commented on YARN-1044: --- Sounds good. I'll submit a patch soon. > used/min/max resources do not display info in the scheduler page > > > Key: YARN-1044 > URL: https://issues.apache.org/jira/browse/YARN-1044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sangjin Lee >Priority: Minor > Labels: newbie > Attachments: screenshot.png > > > Go to the scheduler page in RM, and click any queue to display the detailed > info. You'll find that none of the resources entries (used, min, or max) > would display values. > It is because the values contain brackets ("<" and ">") and are not properly > html-escaped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739801#comment-13739801 ] Sangjin Lee commented on YARN-451: -- I agree that hadoop 1 was different as the notion of mappers and reducers was explicit from the overview and the RM works in a different way in terms of resource allocation. I am pointing out that from a user perspective there is a feature gap where one cannot quickly get a sense of relative sizes of apps/jobs. I also agree that the solution should be done in a way such that it doesn't crowd the UI and also conforms well with the current RM architecture. Thanks! > Add more metrics to RM page > --- > > Key: YARN-451 > URL: https://issues.apache.org/jira/browse/YARN-451 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Lohit Vijayarenu >Priority: Minor > > ResourceManager webUI shows list of RUNNING applications, but it does not > tell which applications are requesting more resource compared to others. With > cluster running hundreds of applications at once it would be useful to have > some kind of metric to show high-resource usage applications vs low-resource > usage ones. At the minimum showing number of containers is good option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739711#comment-13739711 ] Hudson commented on YARN-1060: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1518 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1518/]) YARN-1060. Two tests in TestFairScheduler are missing @Test annotation (Niranjan Singh via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Fix For: 2.3.0 > > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-337) RM handles killed application tracking URL poorly
[ https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739713#comment-13739713 ] Hudson commented on YARN-337: - SUCCESS: Integrated in Hadoop-trunk-Commit #4257 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4257/]) YARN-337. RM handles killed application tracking URL poorly. Contributed by Jason Lowe (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513888) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java > RM handles killed application tracking URL poorly > - > > Key: YARN-337 > URL: https://issues.apache.org/jira/browse/YARN-337 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe > Labels: usability > Attachments: YARN-337.patch > > > When the ResourceManager kills an application, it leaves the proxy URL > redirecting to the original tracking URL for the application even though the > ApplicationMaster is no longer there to service it. It should redirect it > somewhere more useful, like the RM's web page for the application, where the > user can find that the application was killed and links to the AM logs. > In addition, sometimes the AM during teardown from the kill can attempt to > unregister and provide an updated tracking URL, but unfortunately the RM has > "forgotten" the AM due to the kill and refuses to process the unregistration. > Instead it logs: > {noformat} > 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_01 > {noformat} > It should go ahead and process the unregistration to update the tracking URL > since the application offered it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739656#comment-13739656 ] Hudson commented on YARN-1060: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1491 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1491/]) YARN-1060. Two tests in TestFairScheduler are missing @Test annotation (Niranjan Singh via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Fix For: 2.3.0 > > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1023) [YARN-321] Webservices REST API's support for Application History
[ https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739634#comment-13739634 ] Hadoop QA commented on YARN-1023: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597960/YARN-1023-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1711//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1711//console This message is automatically generated. > [YARN-321] Webservices REST API's support for Application History > - > > Key: YARN-1023 > URL: https://issues.apache.org/jira/browse/YARN-1023 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-321 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739621#comment-13739621 ] Hadoop QA commented on YARN-954: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597959/YARN-954-v2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1712//console This message is automatically generated. > [YARN-321] History Service should create the webUI and wire it to > HistoryStorage > > > Key: YARN-954 > URL: https://issues.apache.org/jira/browse/YARN-954 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Devaraj K > Attachments: YARN-954-v0.patch, YARN-954-v1.patch, YARN-954-v2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage
[ https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-954: --- Attachment: YARN-954-v2.patch > [YARN-321] History Service should create the webUI and wire it to > HistoryStorage > > > Key: YARN-954 > URL: https://issues.apache.org/jira/browse/YARN-954 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Devaraj K > Attachments: YARN-954-v0.patch, YARN-954-v1.patch, YARN-954-v2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1023) [YARN-321] Webservices REST API's support for Application History
[ https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-1023: Attachment: YARN-1023-v1.patch > [YARN-321] Webservices REST API's support for Application History > - > > Key: YARN-1023 > URL: https://issues.apache.org/jira/browse/YARN-1023 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-321 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1036) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/YARN-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739540#comment-13739540 ] Hudson commented on YARN-1036: -- FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #699 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/699/]) YARN-1036. Distributed Cache gives inconsistent result if cache files get deleted from task tracker. Contributed by Mayank Bansal and Ravi Prakash (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513636) * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: YARN-1036 > URL: https://issues.apache.org/jira/browse/YARN-1036 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 0.23.10 > > Attachments: YARN-1036.branch-0.23.patch, > YARN-1036.branch-0.23.patch, YARN-1036.branch-0.23.patch > > > This is a JIRA to backport MAPREDUCE-4342. I had to open a new JIRA because > that one had been closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-543) [Umbrella] NodeManager localization related issues
[ https://issues.apache.org/jira/browse/YARN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739541#comment-13739541 ] Hudson commented on YARN-543: - FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #699 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/699/]) YARN-543. Shared data structures in Public Localizer and Private Localizer are not Thread safe. Contributed by Omkar Vinit Joshi and Mit Desai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513674) * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > [Umbrella] NodeManager localization related issues > -- > > Key: YARN-543 > URL: https://issues.apache.org/jira/browse/YARN-543 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Vinod Kumar Vavilapalli > > Seeing a bunch of localization related issues being worked on, this is the > tracking ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1060) Two tests in TestFairScheduler are missing @Test annotation
[ https://issues.apache.org/jira/browse/YARN-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739521#comment-13739521 ] Hudson commented on YARN-1060: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #301 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/301/]) YARN-1060. Two tests in TestFairScheduler are missing @Test annotation (Niranjan Singh via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > Two tests in TestFairScheduler are missing @Test annotation > --- > > Key: YARN-1060 > URL: https://issues.apache.org/jira/browse/YARN-1060 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Niranjan Singh > Labels: newbie > Fix For: 2.3.0 > > Attachments: YARN-1060.patch > > > Amazingly, these tests appear to pass with the annotations added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739424#comment-13739424 ] Junping Du commented on YARN-292: - bq. I'll document it as the comment in AMContainerAllocatedTransition. Thanks. bq. CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it. That's true. thx! > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739378#comment-13739378 ] Zhijie Shen commented on YARN-292: -- Thanks for reviewing the patch, Junping! bq. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers(). In ScheduleTransition, it is already checked that the number of allocated containers is 0, which means newlyAllocatedContainers is still empty. Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an RMContainer is created and put into newlyAllocatedContainers. Therefore, in AMContainerAllocatedTransition, at least 1 container is expected. I'll document it as the comment in AMContainerAllocatedTransition. bq. but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it. > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739372#comment-13739372 ] Karthik Kambatla commented on YARN-1055: bq. In case of a network issue where the AM is running but cannot talk to the RM or say the NM on which the AM was running goes down, what knob would control handling these situations? For these two cases, I would use the AM-failure knob because the AM is "perceived" to have failed. Is there more to the question that I have totally missed? > Handle app recovery differently for AM failures and RM restart > -- > > Key: YARN-1055 > URL: https://issues.apache.org/jira/browse/YARN-1055 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla > > Ideally, we would like to tolerate container, AM, RM failures. App recovery > for AM and RM currently relies on the max-attempts config; tolerating AM > failures requires it to be > 1 and tolerating RM failure/restart requires it > to be = 1. > We should handle these two differently, with two separate configs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739316#comment-13739316 ] Junping Du commented on YARN-292: - Also, I see you only address Fifo and Fair, but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). Shall we apply the same change there? > ResourceManager throws ArrayIndexOutOfBoundsException while handling > CONTAINER_ALLOCATED for application attempt > > > Key: YARN-292 > URL: https://issues.apache.org/jira/browse/YARN-292 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Devaraj K >Assignee: Zhijie Shen > Attachments: YARN-292.1.patch > > > {code:xml} > 2012-12-26 08:41:15,030 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: > Calling allocate on removed or non existant application > appattempt_1356385141279_49525_01 > 2012-12-26 08:41:15,031 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type CONTAINER_ALLOCATED for applicationAttempt > application_1356385141279_49525 > java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.Arrays$ArrayList.get(Arrays.java:3381) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira