[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704368#comment-14704368 ] Varun Saxena commented on YARN-4053: bq. it might be good to restrict the numeric types the metric will support. long and double sounds good to me. Can add verification as you said. bq. HBase already provides a facility to encode and decode between numbers and bytes Yes I know. As I had to append one byte in front of the byte array, I moved the logic in Bytes.toBytes to a separate method. This was done to avoid creation of 2 byte arrays(one inside Bytes.toBytes and one in ATS code) and henceforth copying over result from Bytes.toBytes to the byte array created inside ATS code. Although this is just 8 bytes. So maybe can do above. bq. Also, instead of encoding the info whether this is an integral type vs. floating type into the value, it would be better to have this information in the column qualifier. I see some issue in having this info in column qualifier. Because certain HBase filters like SingleColumnValueFilter require exact column qualifier name. So we will have to again guess about the type(similar to current patch) when we use it. Probably we can discuss this offline and conclude there. Will send a mail. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-4053-YARN-2928.01.patch Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
[ https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706224#comment-14706224 ] Rohith Sharma K S commented on YARN-4044: - Thanks [~sunilg] for the patch.. The patch mostly looks good to me.. Have you verified in the real cluster? Running applications information changes such as movequeue is not published to TimeLine server -- Key: YARN-4044 URL: https://issues.apache.org/jira/browse/YARN-4044 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-4044.patch SystemMetricsPublisher need to expose an appUpdated api to update any change for a running application. Events can be - change of queue for a running application. - change of application priority for a running application. This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
[ https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706234#comment-14706234 ] Sunil G commented on YARN-4044: --- Thank you [~rohithsharma] Yes. I have verified this in real cluster. I will upload few screen shots later. Running applications information changes such as movequeue is not published to TimeLine server -- Key: YARN-4044 URL: https://issues.apache.org/jira/browse/YARN-4044 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-4044.patch SystemMetricsPublisher need to expose an appUpdated api to update any change for a running application. Events can be - change of queue for a running application. - change of application priority for a running application. This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority
Sunil G created YARN-4068: - Summary: Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority Key: YARN-4068 URL: https://issues.apache.org/jira/browse/YARN-4068 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Sunil G Assignee: Sunil G YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to track and port appUpdated changes in V2 for - movetoqueue - updateAppPriority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3986) getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead
[ https://issues.apache.org/jira/browse/YARN-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706264#comment-14706264 ] Hudson commented on YARN-3986: -- FAILURE: Integrated in Hadoop-trunk-Commit #8334 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8334/]) YARN-3986. getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface (rohithsharmaks: rev 22de7c1dca1be63d523de833163ae51bfe638a79) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/CHANGES.txt getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead -- Key: YARN-3986 URL: https://issues.apache.org/jira/browse/YARN-3986 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3986.01.patch, YARN-3986.02.patch, YARN-3986.03.patch Currently getTransferredContainers is present in {{AbstractYarnScheduler}}. *But in ApplicationMasterService, while registering AM, we are calling this method by typecasting it to AbstractYarnScheduler, which is incorrect.* This method should be moved to YarnScheduler. Because if a custom scheduler is to be added, it will implement YarnScheduler, not AbstractYarnScheduler. As ApplicationMasterService is calling getTransferredContainers by typecasting it to AbstractYarnScheduler, it is imposing an indirect dependency on AbstractYarnScheduler for any pluggable custom scheduler. We can move the method to YarnScheduler and leave the definition in AbstractYarnScheduler as it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706052#comment-14706052 ] Li Lu commented on YARN-3862: - bq. it would be useful to maintain separation between limiting the contents that are returned (akin to contents of SELECT in SQL) and limiting the rows that are selected (akin to the WHERE clause in SQL). I agree we should distinguish those two use cases. Restricting our filters to be predicates on rows will work perfectly for relational databases (and launch SQL queries), but if we storage data in our current fashion, we may also need to dynamically filter some columns I assume? For example, we may have a column filter that selects all configs that starts with yarn.timelineservice.. I think most of these column filters will work on column qualifiers but not the values. Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705986#comment-14705986 ] Sangjin Lee commented on YARN-3862: --- Sorry it took me awhile to get around to looking at this. As for the timeline filters, strictly speaking these are filters that filter based on the column qualifiers, and *not* on the values, right? Or are we combining both types of filtering here? IMO, it would be good to limit this to filtering of columns only for column qualifiers and not do the values. I think those 2 things are conceptually separate, and would cause confusion if they're mixed. The reason I ask that is the patch has comparison filters ({{TimelineCompareFilter}}) and operators that are related to comparisons. I'm not sure how they relate to the filtering based on the column qualifiers. So far we're talking about prefix match for the most part... On a similar note, how about the filter based on the limit as suggested by [~gtCarrera9]? Are we also mixing concepts there? The filters that are mentioned here do not select rows but rather pick out *contents* to return (i.e. columns or cells), whereas the limit filter would be selecting rows. I chatted with Joep on this, and I personally feel that it would be useful to maintain separation between limiting the contents that are returned (akin to contents of SELECT in SQL) and limiting the rows that are selected (akin to the WHERE clause in SQL). Thoughts? Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler
[ https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706035#comment-14706035 ] Johan Gustavsson commented on YARN-4066: As I done seem to be able to edit the above comment and the tree ended up weird I'll repast it below root: 1 q1: veryhigh high default low verylow Large number of queues choke fair scheduler --- Key: YARN-4066 URL: https://issues.apache.org/jira/browse/YARN-4066 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Johan Gustavsson Attachments: yarn-4066-1.patch Due to synchronization and all the loops performed during queue creation, setting a large amount of queues (12000+) will completely choke the scheduler. To deal with this some optimization to QueueManager.updateAllocationConfiguration(AllocationConfiguration queueConf) should be done to reduce the amount of unnesecary loops. The attached patch have been tested to work with atleast 96000 queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v6.patch the findbugs warning is about unchecked rawtypes in AMLivelinessMonitor.java. I fixed it in the v6 patch. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705999#comment-14705999 ] Wangda Tan commented on YARN-4055: -- Merged this to branch-2 Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch, YARN-4055-v1.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2923) Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706013#comment-14706013 ] Naganarasimha G R commented on YARN-2923: - Thanks [~leftnoteasy] [~vinodkv], for review and committing this jira ! Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup - Key: YARN-2923 URL: https://issues.apache.org/jira/browse/YARN-2923 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2923.20141204-1.patch, YARN-2923.20141210-1.patch, YARN-2923.20150328-1.patch, YARN-2923.20150404-1.patch, YARN-2923.20150517-1.patch, YARN-2923.20150817-1.patch, YARN-2923.20150818-1.patch As part of Distributed Node Labels configuration we need to support Node labels to be configured in Yarn-site.xml. And on modification of Node Labels configuration in yarn-site.xml, NM should be able to get modified Node labels from this NodeLabelsprovider service without NM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler
[ https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706031#comment-14706031 ] Johan Gustavsson commented on YARN-4066: Basically its a tree as follows ranging from 1 to 16000. For each user group there is one general queue and one with weight divided sub queues. root - 1 - q1 - veryhigh - high - default - low - verylow Large number of queues choke fair scheduler --- Key: YARN-4066 URL: https://issues.apache.org/jira/browse/YARN-4066 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Johan Gustavsson Attachments: yarn-4066-1.patch Due to synchronization and all the loops performed during queue creation, setting a large amount of queues (12000+) will completely choke the scheduler. To deal with this some optimization to QueueManager.updateAllocationConfiguration(AllocationConfiguration queueConf) should be done to reduce the amount of unnesecary loops. The attached patch have been tested to work with atleast 96000 queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3603) Application Attempts page confusing
[ https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3603: -- Attachment: 0003-YARN-3603.patch Rebasing patch against latest trunk. Application Attempts page confusing --- Key: YARN-3603 URL: https://issues.apache.org/jira/browse/YARN-3603 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.8.0 Reporter: Thomas Graves Assignee: Sunil G Attachments: 0001-YARN-3603.patch, 0002-YARN-3603.patch, 0003-YARN-3603.patch, ahs1.png The application attempts page (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01) is a bit confusing on what is going on. I think the table of containers there is for only Running containers and when the app is completed or killed its empty. The table should have a label on it stating so. Also the AM Container field is a link when running but not when its killed. That might be confusing. There is no link to the logs in this page but there is in the app attempt table when looking at http:// rm:8088/cluster/app/application_1431101480046_0003 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2923) Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705533#comment-14705533 ] Wangda Tan commented on YARN-2923: -- Thanks for update, [~Naganarasimha], latest patch LGTM, yarn-default.xml has wrong indent, I will fix them while commit. Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup - Key: YARN-2923 URL: https://issues.apache.org/jira/browse/YARN-2923 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2923.20141204-1.patch, YARN-2923.20141210-1.patch, YARN-2923.20150328-1.patch, YARN-2923.20150404-1.patch, YARN-2923.20150517-1.patch, YARN-2923.20150817-1.patch, YARN-2923.20150818-1.patch As part of Distributed Node Labels configuration we need to support Node labels to be configured in Yarn-site.xml. And on modification of Node Labels configuration in yarn-site.xml, NM should be able to get modified Node labels from this NodeLabelsprovider service without NM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705403#comment-14705403 ] Wangda Tan commented on YARN-4024: -- [~zhiguohong], thanks for update, patch generally looks good, could you take a look at findbugs warning? bq. FindBugsmodule:hadoop-yarn-server-resourcemanager Sometimes it shows 0 findbugs when you click at the findbugs report link, but it has some bugs. You can go to yarn-resourcemanager project to run mvn clean findbugs:findbugs. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4055) Report node resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705560#comment-14705560 ] Wangda Tan commented on YARN-4055: -- [~kasha], did you forget to merge it to branch-2? I didn't find it in branch-2. Report node resource utilization in heartbeat - Key: YARN-4055 URL: https://issues.apache.org/jira/browse/YARN-4055 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Inigo Goiri Assignee: Inigo Goiri Fix For: 2.8.0 Attachments: YARN-4055-v0.patch, YARN-4055-v1.patch Send the resource utilization from the node (obtained in the NodeResourceMonitor) to the RM in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler
[ https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705015#comment-14705015 ] Karthik Kambatla commented on YARN-4066: The patch looks reasonable. Could you comment on your queue setup (depth, average breadth etc.) for the 96,000 queues that you tested this on? Just curious. Large number of queues choke fair scheduler --- Key: YARN-4066 URL: https://issues.apache.org/jira/browse/YARN-4066 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Johan Gustavsson Attachments: yarn-4066-1.patch Due to synchronization and all the loops performed during queue creation, setting a large amount of queues (12000+) will completely choke the scheduler. To deal with this some optimization to QueueManager.updateAllocationConfiguration(AllocationConfiguration queueConf) should be done to reduce the amount of unnesecary loops. The attached patch have been tested to work with atleast 96000 queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704926#comment-14704926 ] MENG DING commented on YARN-1644: - The findbugs warning is not related. The link given shows 0 warnings: https://builds.apache.org/job/PreCommit-YARN-Build/8884/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html. My own tests locally did not show any findbugs warnings. Thanks for the review! RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing - Key: YARN-1644 URL: https://issues.apache.org/jira/browse/YARN-1644 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Wangda Tan Assignee: MENG DING Attachments: YARN-1644-YARN-1197.4.patch, YARN-1644-YARN-1197.5.patch, YARN-1644-YARN-1197.6.patch, YARN-1644.1.patch, YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority
[ https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4068: -- Issue Type: Sub-task (was: Bug) Parent: YARN-2928 Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority - Key: YARN-4068 URL: https://issues.apache.org/jira/browse/YARN-4068 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sunil G Assignee: Sunil G YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to track and port appUpdated changes in V2 for - movetoqueue - updateAppPriority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706269#comment-14706269 ] Rohith Sharma K S commented on YARN-4014: - bq. When 2nd or subsequent AM attempt is spawned, we are never setting the old attempt as null in SchedulerApplication, correct? Hence there is a chance that we set priority to old attempt while new attempt is getting created.. Right.. Since latest priority has been reset to attempt after attempt got updated in the SchedulerApplication#setCurrentAttempt, I think there would NOT ocur any possibility where currentAttempt has old priority. So I believe currentAttempt NEED NOT to be volatile. [~jianhe] Could you give your opinion on this? Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 0004-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) DRC from 2.7.1 could set negative available resource
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Description: as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative. DRC from 2.7.1 could set negative available resource Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705789#comment-14705789 ] Chang Li commented on YARN-4045: [~wangda] I opened a jira YARN-4067 regarding drc issue in 2.7.1, and post a patch. Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO capacity.ParentQueue:
[jira] [Updated] (YARN-4067) DRC from 2.7.1 could set negative available resource
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Fix Version/s: 2.7.1 DRC from 2.7.1 could set negative available resource Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) DRC from 2.7.1 could set negative available resource
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Affects Version/s: 2.7.1 DRC from 2.7.1 could set negative available resource Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) DRC from 2.7.1 could set negative available resource
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Attachment: YARN-4067.patch DRC from 2.7.1 could set negative available resource Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-4067.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4067) DRC from 2.7.1 could set negative available resource
Chang Li created YARN-4067: -- Summary: DRC from 2.7.1 could set negative available resource Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler
[ https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705094#comment-14705094 ] Arun Suresh commented on YARN-4066: --- +1 pending jenkins Large number of queues choke fair scheduler --- Key: YARN-4066 URL: https://issues.apache.org/jira/browse/YARN-4066 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Johan Gustavsson Attachments: yarn-4066-1.patch Due to synchronization and all the loops performed during queue creation, setting a large amount of queues (12000+) will completely choke the scheduler. To deal with this some optimization to QueueManager.updateAllocationConfiguration(AllocationConfiguration queueConf) should be done to reduce the amount of unnesecary loops. The attached patch have been tested to work with atleast 96000 queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2923) Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705570#comment-14705570 ] Hudson commented on YARN-2923: -- FAILURE: Integrated in Hadoop-trunk-Commit #8329 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8329/]) YARN-2923. Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup. (Naganarasimha G R) (wangda: rev fc07464d1a48b0413da5e921614430e41263fdb7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/CHANGES.txt Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup - Key: YARN-2923 URL: https://issues.apache.org/jira/browse/YARN-2923 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2923.20141204-1.patch, YARN-2923.20141210-1.patch, YARN-2923.20150328-1.patch, YARN-2923.20150404-1.patch, YARN-2923.20150517-1.patch, YARN-2923.20150817-1.patch, YARN-2923.20150818-1.patch As part of Distributed Node Labels configuration we need to support Node labels to be configured in Yarn-site.xml. And on modification of Node Labels configuration in yarn-site.xml, NM should be able to get modified Node labels from this NodeLabelsprovider service without NM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705667#comment-14705667 ] Chang Li commented on YARN-4045: Hi [~wangda], for the first case, should we check availableResource of root queue when a node gets removed? Then if available memory is negative, we proceed to unreserve some resource until the available memory of root queue becomes positive. Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743
[jira] [Updated] (YARN-4067) DRC from 2.7.1 could set negative available resource
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Attachment: (was: YARN-4067.patch) DRC from 2.7.1 could set negative available resource Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Summary: available resource could be set negative (was: DRC from 2.7.1 could set negative available resource) available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Attachment: YARN-4067.patch available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to set negative value resource to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Description: as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to set negative value resource to zero (was: as mentioned in YARN-4045 by [~wangda], drc could set negative resource if available resource's memory go negative.) available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to set negative value resource to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705878#comment-14705878 ] Chang Li commented on YARN-4045: [~leftnoteasy], sorry, was tagging to the wrong user name Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root
[jira] [Commented] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705880#comment-14705880 ] Chang Li commented on YARN-4067: [~leftnoteasy] available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Description: as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero (was: as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to set negative value resource to zero) available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705858#comment-14705858 ] Chang Li commented on YARN-4067: [~wangda] could you please help take a look at this proposed change? Thanks available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4067) available resource could be set negative
[ https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4067: --- Description: as mentioned in YARN-4045 by [~leftnoteasy], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero (was: as mentioned in YARN-4045 by [~wangda], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero) available resource could be set negative Key: YARN-4067 URL: https://issues.apache.org/jira/browse/YARN-4067 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Chang Li Assignee: Chang Li Fix For: 2.7.1 Attachments: YARN-4067.patch as mentioned in YARN-4045 by [~leftnoteasy], available memory could be negative due to reservation, propose to use componentwiseMax to updateQueueStatistics in order to cap negative value to zero -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3868) ContainerManager recovery for container resizing
[ https://issues.apache.org/jira/browse/YARN-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-3868: Attachment: YARN-3868-YARN-1197.5.patch Attach the latest patch to add a case for testing the recovery of container resource change in {{TestNMLeveldbStateStoreService}} ContainerManager recovery for container resizing Key: YARN-3868 URL: https://issues.apache.org/jira/browse/YARN-3868 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3868-YARN-1197.3.patch, YARN-3868-YARN-1197.4.patch, YARN-3868-YARN-1197.5.patch, YARN-3868.1.patch, YARN-3868.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3868) ContainerManager recovery for container resizing
[ https://issues.apache.org/jira/browse/YARN-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705887#comment-14705887 ] Hadoop QA commented on YARN-3868: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 54s | Findbugs (version ) appears to be broken on YARN-1197. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 20s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 7m 17s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 45m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751513/YARN-3868-YARN-1197.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-1197 / 4dd004b | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8890/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8890/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8890/console | This message was automatically generated. ContainerManager recovery for container resizing Key: YARN-3868 URL: https://issues.apache.org/jira/browse/YARN-3868 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3868-YARN-1197.3.patch, YARN-3868-YARN-1197.4.patch, YARN-3868-YARN-1197.5.patch, YARN-3868.1.patch, YARN-3868.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)