[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596089#comment-14596089 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2182 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2182/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Scrub debug logging of tokens during resource localization. --- Key: YARN-3834 URL: https://issues.apache.org/jira/browse/YARN-3834 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.8.0 Attachments: YARN-3834.001.patch During resource localization, the NodeManager logs tokens at debug level to aid troubleshooting. This includes the full token representation. Best practice is to avoid logging anything secret, even at debug level. We can improve on this by changing the logging to use a scrubbed representation of the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596173#comment-14596173 ] Ted Yu commented on YARN-3815: -- My comment is related to usage of hbase. bq. under framework_specific_metrics column family Since column family name appears in every KeyValue, it would be better to use very short column family name. e.g. f_m for framework metrics. [Aggregation] Application/Flow/User/Queue Level Aggregations Key: YARN-3815 URL: https://issues.apache.org/jira/browse/YARN-3815 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf Per previous discussions in some design documents for YARN-2928, the basic scenario is the query for stats can happen on: - Application level, expect return: an application with aggregated stats - Flow level, expect return: aggregated stats for a flow_run, flow_version and flow - User level, expect return: aggregated stats for applications submitted by user - Queue level, expect return: aggregated stats for applications within the Queue Application states is the basic building block for all other level aggregations. We can provide Flow/User/Queue level aggregated statistics info based on application states (a dedicated table for application states is needed which is missing from previous design documents like HBase/Phoenix schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596257#comment-14596257 ] Siqi Li commented on YARN-3176: --- Hi [~djp], can you take a look at patch v2. The checkstyle issues and test errors do not seems to apply to this patch In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3176.v1.patch, YARN-3176.v2.patch if the child queue does not have a maxRunningApp limit, it will use the queueMaxAppsDefault. This behavior is not quite right, since queueMaxAppsDefault is normally a small number, whereas some parent queues do have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596129#comment-14596129 ] Junping Du commented on YARN-3815: -- Thanks [~sjlee0] and [~jrottinghuis] for review and good comments in detail. [~jrottinghuis]'s comments are pretty long and I could only reply part of it and will finish the left parts tomorrow. :) bq. For framework-specific metrics, I would say this falls on the individual frameworks. The framework AM usually already aggregates them in memory (consider MR job counters for example). So for them it is straightforward to write them out directly onto the YARN app entities. Furthermore, it is problematic to add them to the sub-app YARN entities and ask YARN to aggregate them to the application. Framework’s sub-app entities may not even align with YARN’s sub-app entities. For example, in case of MR, there is a reasonable one-to-one mapping between a mapper/reducer task attempt and a container, but for other applications that may not be true. Forcing all frameworks to hang values at containers may not be practical. I think it’s far easier for frameworks to write aggregated values to the YARN app entities. AM currently leverage YARN's AppTimelineCollector to forward entities to backend storage, so making AM talk directly to backend storage is not considered to be safe. It is also not necessary too because the real difficulty here is to aggregate framework specific metrics in other levels (flow, user and queue), because that beyond the life cycle of framework so YARN have to take care of it. Instead of asking frameworks to handle specific metrics themselves, I would like to propose to treat these metrics as anonymous, it would pass both metrics name and value to YARN's collector and YARN's collector could aggregate it and store as dynamic column (under framework_specific_metrics column family) into app states table. So other (flow, user, etc.) level aggregation on freamework metrics could happen based on this. bq. app-to-flow online aggregation. This is more or less live aggregated metrics at the flow level. This will still be based on the native HBase schema. About flow online aggregation, I am not quite sure on requirement yet. Do we really want real time for flow aggregated data or some fine-grained time interval (like 15 secs) should be good enough - if we want to show some nice metrics chart for flow, this should be fine. Even for real time, we don't have to aggregate everything from raw entity table, we don't have to duplicated count metrics again for finished apps. Isn't it? bq. (3) time-based flow aggregation: This is different than the online aggregation in the sense that it is aggregated along the time boundary (e.g. “daily”, “weekly”, etc.). This can be based on the Phoenix schema. This can be populated in an offline fashion (e.g. running a mapreduce job). Any special reason not to handle it in the same way above - as HBase coprocessor? It just sound like gross-grained time interval. Isn't it? bq. This is another “offline” aggregation type. Also, I believe we’re talking about only time-based aggregation. In other words, we would aggregate values for users only with a well-defined time window. There won’t be a “real-time” aggregation of values, similar to the flow aggregation. I would also call for a fine-grained time interval (closed to real-time) because the aggregated resource metrics on user could be used in billing hadoop usage in a shared environment (no matter private or public cloud), so user need to know more details on resource consumption especially in some random peak time. bq. Very much agree with separation into 2 categories online versus periodic. I think this will be natural split between the native HBase tables for the former and the Phoenix approach for the latter to each emphasize their relative strengths. I would question the necessary for online again if this mean real time instead of fine-grained time interval. Actually, as a building block, every container metrics (cpu, memory, etc.) are generated in a time interval instead of real time. As a result, we never know the exactly snapshot of whole system in a precisely time but only can try to getting closer. [Aggregation] Application/Flow/User/Queue Level Aggregations Key: YARN-3815 URL: https://issues.apache.org/jira/browse/YARN-3815 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf Per previous discussions in some design documents for YARN-2928, the basic scenario is the query for stats can happen
[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596230#comment-14596230 ] Xuan Gong commented on YARN-3840: - [~Alexandre LINTE] Hey, could you provide which version of hadoop you are using ? 2.7 ? Resource Manager web ui bug on main view after application number -- Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Centos 6.6 Java 1.7 Reporter: LINTE On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2902: --- Attachment: YARN-2902.03.patch Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596059#comment-14596059 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #234 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/234/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt Scrub debug logging of tokens during resource localization. --- Key: YARN-3834 URL: https://issues.apache.org/jira/browse/YARN-3834 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.8.0 Attachments: YARN-3834.001.patch During resource localization, the NodeManager logs tokens at debug level to aid troubleshooting. This includes the full token representation. Best practice is to avoid logging anything secret, even at debug level. We can improve on this by changing the logging to use a scrubbed representation of the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596463#comment-14596463 ] Robert Grandl commented on YARN-3820: - [~elgoiri], I fixed the warning because HadoopQA javadoc was -1. I will revert the change if HadoopQA will return +1. Collect disks usages on the node Key: YARN-3820 URL: https://issues.apache.org/jira/browse/YARN-3820 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Robert Grandl Assignee: Robert Grandl Labels: yarn-common, yarn-util Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, YARN-3820-4.patch In this JIRA we propose to collect disks usages on a node. This JIRA is part of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596522#comment-14596522 ] Varun Saxena commented on YARN-3798: Thanks [~ozawa]. Explanation given by you and subsequent discussions with [~rakeshr] helped a lot in clarifying behavior of zookeeper. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596606#comment-14596606 ] zhihai xu commented on YARN-3798: - I think we should also create a new session for SessionMovedException. We hit the SessionMovedException before, the following is the reason for the SessionMovedException we find: # ZK client tried to connect to Leader L. Network was very slow, so before leader processed the request, client disconnected. # Client then re-connected to Follower F reusing the same session ID. It was successful. # The request in step 1 went into leader. Leader processed it and invalidated the connection created in step 2. But client didn't know the connection it used is invalidated. # Client got SessionMovedException when it used the connection invalidated by leader for any ZooKeeper operation. IMHO, the only way to recover from this error at RM side is to take SessionMovedException as SessionExpiredException, close current ZK client and create a new one. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO
[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2801: - Attachment: YARN-2801.2.patch Hi [~Naganarasimha], Thanks for your thoughtful review, comments for your suggestions: 2) There's no preemption related documentation in Apache Hadoop yet, I suggest to add this part after we have a preemption page. 10) They're what admin should specify. I prefer to not add default value here because default is always changing, which will be tracked by {{yarn-default.xml}} 12) Changed it, it should be percentage of resources on nodes with DEFAULT partition. 13) That's different, {{value/value}} and not specifed means inherit from parent 18) REST API is under development, I think we still need some time to finalize it for 2.8. I suggest to add that part later. 19) Added CS link from node labels page, I think it's a relative independent feature. I suggest to not reference from CS. I addressed other items in attached patch. Please let me know your ideas. Thanks, Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Wangda Tan Attachments: YARN-2801.1.patch, YARN-2801.2.patch Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596717#comment-14596717 ] Jason Lowe commented on YARN-3360: -- The checkstyle comments are complaining about existing method argument lengths or the visibility of the Metrics fields. I was replicating the same style used by all other metric fields, so this is consistent with the code base. Add JMX metrics to TimelineDataManager -- Key: YARN-3360 URL: https://issues.apache.org/jira/browse/YARN-3360 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-3360.001.patch, YARN-3360.002.patch, YARN-3360.003.patch The TimelineDataManager currently has no metrics, outside of the standard JVM metrics. It would be very useful to at least log basic counts of method calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3360: - Attachment: YARN-3360.003.patch Rebased patch on trunk Add JMX metrics to TimelineDataManager -- Key: YARN-3360 URL: https://issues.apache.org/jira/browse/YARN-3360 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-3360.001.patch, YARN-3360.002.patch, YARN-3360.003.patch The TimelineDataManager currently has no metrics, outside of the standard JVM metrics. It would be very useful to at least log basic counts of method calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-110) AM releases too many containers due to the protocol
[ https://issues.apache.org/jira/browse/YARN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596614#comment-14596614 ] Giovanni Matteo Fumarola commented on YARN-110: --- [~acmurthy], [~vinodkv] any updates on this? If you don't mind, can I work on this? AM releases too many containers due to the protocol --- Key: YARN-110 URL: https://issues.apache.org/jira/browse/YARN-110 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: YARN-110.patch - AM sends request asking 4 containers on host H1. - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at this point, sets the value against H1 to zero in its aggregate request-table for all apps. - In the mean-while AM gets to need 3 more containers, so a total of 7 including the 4 from previous request. - Today, AM sends the absolute number of 7 against H1 to RM as part of its request table. - RM seems to be overriding its earlier value of zero against H1 to 7 against H1. And thus allocating 7 more containers. - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of 11 instead of the required 7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3842: Attachment: YARN-3842.001.patch That makes sense. The patch is also a lot simpler; it just adds a retry policy for {{NMNotYetReadyException}}, and a test. NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596663#comment-14596663 ] Subru Krishnan commented on YARN-3800: -- Thanks [~adhoot] for the patch. I looked at it just had a couple of comments: 1. Can we have _toResource(ReservationRequest request)_ in a Reservation utility class rather than in _InMemoryReservationAllocation_ 2. I feel we can update the constructor of _InMemoryReservationAllocation_ to take in _MapReservationInterval, Resource_ instead of _MapReservationInterval, ReservationRequest_ so that we do the translation only once. This should simplify the state in GreedyReservationAgent also. Simplify inmemory state for ReservationAllocation - Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3800.001.patch, YARN-3800.002.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596616#comment-14596616 ] Ted Yu commented on YARN-3815: -- bq. in the spirit of readless increments as used in Tephra Readless increment feature is implemented in cdap, called delta write. Please take a look at: cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java cdap-hbase-compat-0.98//src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java The implementation uses hbase coprocessor, BTW [Aggregation] Application/Flow/User/Queue Level Aggregations Key: YARN-3815 URL: https://issues.apache.org/jira/browse/YARN-3815 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf Per previous discussions in some design documents for YARN-2928, the basic scenario is the query for stats can happen on: - Application level, expect return: an application with aggregated stats - Flow level, expect return: aggregated stats for a flow_run, flow_version and flow - User level, expect return: aggregated stats for applications submitted by user - Queue level, expect return: aggregated stats for applications within the Queue Application states is the basic building block for all other level aggregations. We can provide Flow/User/Queue level aggregated statistics info based on application states (a dedicated table for application states is needed which is missing from previous design documents like HBase/Phoenix schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596620#comment-14596620 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 48s | The applied patch generated 2 additional warning messages. | | {color:red}-1{color} | release audit | 0m 18s | The applied patch generated 4 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 18 new checkstyle issues (total was 204, now 215). | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 27s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 61m 8s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 99m 39s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741096/YARN-3635.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/diffJavadocWarnings.txt | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8311/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8311/console | This message was automatically generated. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla moved MAPREDUCE-6409 to YARN-3842: --- Target Version/s: 2.7.1 (was: 2.7.1) Affects Version/s: (was: 2.7.0) 2.7.0 Key: YARN-3842 (was: MAPREDUCE-6409) Project: Hadoop YARN (was: Hadoop Map/Reduce) NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596706#comment-14596706 ] Hadoop QA commented on YARN-2801: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 3m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | site | 1m 58s | Site compilation is broken. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 5m 26s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741138/YARN-2801.2.patch | | Optional Tests | site | | git revision | trunk / 11ac848 | | site | https://builds.apache.org/job/PreCommit-YARN-Build/8315/artifact/patchprocess/patchSiteWarnings.txt | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8315/console | This message was automatically generated. Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Wangda Tan Attachments: YARN-2801.1.patch, YARN-2801.2.patch Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-3843: Attachment: YARN-3843.01.patch Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3843.01.patch As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596867#comment-14596867 ] Hadoop QA commented on YARN-3842: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 27s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741154/YARN-3842.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 077250d | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8317/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8317/console | This message was automatically generated. NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch, YARN-3842.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596886#comment-14596886 ] Naganarasimha G R commented on YARN-2801: - hi [~leftnoteasy], seems like after applying the patch mvn site is failing Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Wangda Tan Attachments: YARN-2801.1.patch, YARN-2801.2.patch Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3842: Attachment: YARN-3842.002.patch The new patch make the changes Karthik suggested. I also added a few comments and renamed {{isExpectingNMNotYetReadyException}} to {{shouldThrowNMNotYetReadyException}} for clarity. NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch, YARN-3842.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used
[ https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596837#comment-14596837 ] Ming Ma commented on YARN-2862: --- Thanks, [~rohithsharma] and [~leftnoteasy]. Yes, YARN-3410 will be useful. So admins still need to look through RM logs to identify those apps. Will it be useful to provide a new RM startup option to delete or skip such apps automatically? RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used --- Key: YARN-2862 URL: https://issues.apache.org/jira/browse/YARN-2862 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma This might be a known issue. Given FileSystemRMStateStore isn't used for HA scenario, it might not be that important, unless there is something we need to fix at RM layer to make it more tolerant to RMStore issue. When RM was hard shutdown, OS might not get a chance to persist blocks. Some of the stored application data end up with size zero after reboot. And RM didn't like that. {noformat} ls -al /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351 total 156 drwxr-xr-x.2 x y 4096 Nov 13 16:45 . drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 .. -rw-r--r--.1 x y 0 Nov 13 16:45 appattempt_1412702189634_324351_01 -rw-r--r--.1 x y 0 Nov 13 16:45 .appattempt_1412702189634_324351_01.crc -rw-r--r--.1 x y 0 Nov 13 16:45 application_1412702189634_324351 -rw-r--r--.1 x y 0 Nov 13 16:45 .application_1412702189634_324351.crc {noformat} When RM starts up {noformat} 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem opening checksum file: file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351. Ignoring exception: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501) ... 2014-11-13 17:40:48,876 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to load/recover state java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
Dongwook Kwon created YARN-3843: --- Summary: Fair Scheduler should not accept apps with space keys as queue name Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0, 2.4.0 Reporter: Dongwook Kwon Priority: Minor As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596925#comment-14596925 ] Sangjin Lee commented on YARN-3792: --- The latest patch LGTM. Once the jenkins comes back, I'll go ahead and merge it. Folks, do let me know soon if you have any other feedback. Thanks! Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, YARN-3792-YARN-2928.004.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression # Remove unused {{enableATSV1}} in testDisstributedShell # container metrics needs to be published only for v2 test cases of testDisstributedShell # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux service was not configured and {{TimelineClient.putObjects}} was getting invoked. # Race condition for the Application events to published and test case verification for RM's ApplicationFinished Timeline Events # Application Tags for converted to lowercase in ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3842) NMProxy should retry on NMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3842: --- Summary: NMProxy should retry on NMNotYetReadyException (was: NM restarts could lead to app failures) NMProxy should retry on NMNotYetReadyException -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch, YARN-3842.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596730#comment-14596730 ] Sangjin Lee commented on YARN-3792: --- Thanks [~Naganarasimha] for the update! +1 on the test failure. It appears to be an issue unrelated to the timeline service. It does seem like the whitespace is related to the patch (or in the vicinity of the patch). Could you kindly do a quick change to remove those extra spaces? Also, for findbugs, I ran findbugs against those two projects (distributed shell and resource manager). I do see several findbugs warnings, and they are not introduced by this patch but do appear to be related to the YARN-2928 work. distributed shell: {code} file classname='org.apache.hadoop.yarn.applications.distributedshell.Client'BugInstance type='DM_BOXED_PRIMITIVE_FOR_PARSING' priority='High' category='PERFORMANCE' message='Boxing/unboxing to parse a primitive org.apache.hadoop.yarn.applications.distributedshell.Client.init(String[])' lineNumber='466'//file {code} resource manager: {code} file classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='79'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='76'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='73'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='67'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='70'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='82'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' message='Unchecked/unconfirmed cast from org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent in org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)' lineNumber='85'//file{code} It would be nice to address them (at least the one on Client.java) here, but if you're not inclined, we could do it later... Let me know how you want to proceed. Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3635: - Attachment: YARN-3635.5.patch Attached ver.5, fixed bunch of warnings. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596776#comment-14596776 ] Jian He commented on YARN-3842: --- I think the latest patch is safe for 2.7.1, +1 NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch, YARN-3842.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596783#comment-14596783 ] Karthik Kambatla commented on YARN-3842: +1, pending Jenkins. Thanks for your review, [~jianhe]. I ll go ahead commit this if Jenkins is fine with it. NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch, YARN-3842.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596842#comment-14596842 ] Dongwook Kwon commented on YARN-3843: - From my investigation, QueueMetrics doesn't allow space key string as start or end of names, it just trims empty strings. static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L112 https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L85 So, from FairScheduler, root.adhoc.birvine , this queue name with the space at the end of name, it is treated as different from root.adhoc.birvine because it has one more character, and from QueueMetrics, because names are trimmed, all of sudden, 2 different queue names become the same that causes the error as Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3748) Cleanup Findbugs volatile warnings
[ https://issues.apache.org/jira/browse/YARN-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596936#comment-14596936 ] Gabor Liptak commented on YARN-3748: Any other changes needed before this can be considered for commit? Thanks Cleanup Findbugs volatile warnings -- Key: YARN-3748 URL: https://issues.apache.org/jira/browse/YARN-3748 Project: Hadoop YARN Issue Type: Bug Reporter: Gabor Liptak Priority: Minor Attachments: YARN-3748.1.patch, YARN-3748.2.patch, YARN-3748.3.patch, YARN-3748.5.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596963#comment-14596963 ] Hudson commented on YARN-3835: -- FAILURE: Integrated in Hadoop-trunk-Commit #8051 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8051/]) YARN-3835. hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml (vamsee via rkanter) (rkanter: rev 99271b762129d78c86f3c9733a24c77962b0b3f7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml Key: YARN-3835 URL: https://issues.apache.org/jira/browse/YARN-3835 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Vamsee Yarlagadda Assignee: Vamsee Yarlagadda Priority: Minor Fix For: 2.8.0 Attachments: YARN-3835.patch It looks like by default yarn is bundling core-site.xml, yarn-site.xml in test artifact of hadoop-yarn-server-resourcemanager which means that any downstream project which uses this a dependency can have a problem in picking up the user supplied/environment supplied core-site.xml, yarn-site.xml So we should ideally exclude these .xml files from being bundled into the test-jar. (Similar to YARN-1748) I also proactively looked at other YARN modules where this might be happening. {code} vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml ./hadoop-yarn/conf/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml {code} And out of these only two modules (hadoop-yarn-server-resourcemanager, hadoop-yarn-server-tests) are building test-jars. In future, if we start building test-jar of other modules, we should exclude these xml files from being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596733#comment-14596733 ] Hadoop QA commented on YARN-3842: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 5s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741131/YARN-3842.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8314/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8314/console | This message was automatically generated. NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596858#comment-14596858 ] Dongwook Kwon commented on YARN-3843: - Thanks, you're right, it's duplicated. I didn't find the other jira case, I will close it. Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3843.01.patch As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2801: Assignee: Wangda Tan (was: Naganarasimha G R) Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Wangda Tan Attachments: YARN-2801.1.patch, YARN-2801.2.patch Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-2801: --- Assignee: Naganarasimha G R (was: Wangda Tan) Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Naganarasimha G R Attachments: YARN-2801.1.patch, YARN-2801.2.patch Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NMProxy should retry on NMNotYetReadyException
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596946#comment-14596946 ] Hudson commented on YARN-3842: -- FAILURE: Integrated in Hadoop-trunk-Commit #8050 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8050/]) YARN-3842. NMProxy should retry on NMNotYetReadyException. (Robert Kanter via kasha) (kasha: rev 5ebf2817e58e1be8214dc1916a694a912075aa0a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java * hadoop-yarn-project/CHANGES.txt NMProxy should retry on NMNotYetReadyException -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Fix For: 2.7.1 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch, YARN-3842.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596957#comment-14596957 ] Hadoop QA commented on YARN-3800: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 50s | The applied patch generated 7 new checkstyle issues (total was 55, now 56). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 0s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741165/YARN-3800.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fac4e04 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8318/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8318/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8318/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8318/console | This message was automatically generated. Simplify inmemory state for ReservationAllocation - Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3800.001.patch, YARN-3800.002.patch, YARN-3800.002.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596756#comment-14596756 ] Karthik Kambatla commented on YARN-3842: Thanks for the quick turnaround on this, Robert. One nit-pick on the test: would the following be more concise? {code} if (retryCount 5) { retryCount++; if (isExpectingNMNotYetReadyException) { containerManager.setBlockNewContainerRequests(true); } else { throw new java.net.ConnectException(start container exception); } } else { containerManager.setBlockNewContainerRequests(false); } return super.startContainers(requests); {code} NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596848#comment-14596848 ] zhihai xu commented on YARN-3843: - Hi [~dongwook], thanks for reporting this issue. I think this issue was fixed at YARN-3241. Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3792: Attachment: YARN-3792-YARN-2928.004.patch Hi [~sjlee0], Corrected whitespace and findbugs issue in Client.java attaching a patch for it, and the remaining seems to be not a problemif not unnecessary checks needs to be done. Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, YARN-3792-YARN-2928.004.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression # Remove unused {{enableATSV1}} in testDisstributedShell # container metrics needs to be published only for v2 test cases of testDisstributedShell # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux service was not configured and {{TimelineClient.putObjects}} was getting invoked. # Race condition for the Application events to published and test case verification for RM's ApplicationFinished Timeline Events # Application Tags for converted to lowercase in ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596894#comment-14596894 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 0s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 18 new checkstyle issues (total was 204, now 215). | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 30s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 14s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741145/YARN-3635.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 077250d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8316/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8316/console | This message was automatically generated. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-3705: --- Attachment: YARN-3705.002.patch I'm attached 002 addressing whitespace warnings. TestWorkPreservingRMRestart is not related to the code path the patch fixes. forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Sub-task Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-3705.001.patch, YARN-3705.002.patch Executing {{rmadmin -transitionToStandby --forcemanual}} in automatic-failover.enabled mode makes ResouceManager standby while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin in order to enable other candidates to promote, otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596765#comment-14596765 ] Robert Kanter commented on YARN-3842: - I had sort of just split {{startContainers}} into two sections (one for each part of the test), but this is a lot more concise. I'll do that. NM restarts could lead to app failures -- Key: YARN-3842 URL: https://issues.apache.org/jira/browse/YARN-3842 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, YARN-3842.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3800: Attachment: YARN-3800.002.patch Addressed feedback Simplify inmemory state for ReservationAllocation - Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3800.001.patch, YARN-3800.002.patch, YARN-3800.002.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon resolved YARN-3843. - Resolution: Duplicate Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor Fix For: 2.8.0 Attachments: YARN-3843.01.patch As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment
[ https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596910#comment-14596910 ] Naganarasimha G R commented on YARN-2801: - Hi [~leftnoteasy], After escaping the links, seems like its getting applied. Few nits : * ??User need configure how many resources?? = {{User need configure how much resource of each partition}} * points in note after configuration section needs to come as list * ??application can use following Java APIs?? = ??Application can use following Java APIs?? Apart from it others seems to be fine ! Documentation development for Node labels requirment Key: YARN-2801 URL: https://issues.apache.org/jira/browse/YARN-2801 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Gururaj Shetty Assignee: Wangda Tan Attachments: YARN-2801.1.patch, YARN-2801.2.patch Documentation needs to be developed for the node label requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3001) RM dies because of divide by zero
[ https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595423#comment-14595423 ] Hui Zheng commented on YARN-3001: - The only non-INFO log is following(it is so sudden there is not any other WARN or ERROR ). There are several tens thousands of jobs per day. {code} 2015-06-21 09:53:44,696 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.ArithmeticException: / by zero at org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1335) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignNodeLocalContainers(LeafQueue.java:1185) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run( ResourceManager.java:557)at java.lang.Thread.run(Thread.java:724) 2015-06-21 09:53:44,696 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} RM dies because of divide by zero - Key: YARN-3001 URL: https://issues.apache.org/jira/browse/YARN-3001 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: hoelog Assignee: Rohith Sharma K S RM dies because of divide by zero exception. {code} 2014-12-31 21:27:05,022 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.ArithmeticException: / by zero at org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:745) 2014-12-31 21:27:05,023 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595749#comment-14595749 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #236 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/236/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt Scrub debug logging of tokens during resource localization. --- Key: YARN-3834 URL: https://issues.apache.org/jira/browse/YARN-3834 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.8.0 Attachments: YARN-3834.001.patch During resource localization, the NodeManager logs tokens at debug level to aid troubleshooting. This includes the full token representation. Best practice is to avoid logging anything secret, even at debug level. We can improve on this by changing the logging to use a scrubbed representation of the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
LINTE created YARN-3840: --- Summary: Resource Manager web ui bug on main view after application number Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Centos 6.6 Java 1.7 Reporter: LINTE On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595805#comment-14595805 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Yarn-trunk #966 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/966/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt Scrub debug logging of tokens during resource localization. --- Key: YARN-3834 URL: https://issues.apache.org/jira/browse/YARN-3834 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.8.0 Attachments: YARN-3834.001.patch During resource localization, the NodeManager logs tokens at debug level to aid troubleshooting. This includes the full token representation. Best practice is to avoid logging anything secret, even at debug level. We can improve on this by changing the logging to use a scrubbed representation of the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595826#comment-14595826 ] Devaraj K commented on YARN-3840: - Thanks [~Alexandre LINTE] for reporting the issue. Can you paste the exception if you see anything in the RM UI or in the RM logs? Resource Manager web ui bug on main view after application number -- Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Centos 6.6 Java 1.7 Reporter: LINTE On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages
[ https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595457#comment-14595457 ] Hadoop QA commented on YARN-3826: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 42s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 9m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 44s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 61m 36s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 106m 18s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740355/YARN-3826.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6c7a9d5 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8306/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8306/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8306/console | This message was automatically generated. Race condition in ResourceTrackerService: potential wrong diagnostics messages -- Key: YARN-3826 URL: https://issues.apache.org/jira/browse/YARN-3826 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3826.01.patch Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which can be called concurrently, the static {{resync}} and {{shutdown}} may have wrong diagnostics messages in some cases. On the other side, these static members can hardly save any memory, since the normal heartbeat responses are created for each heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3768: Attachment: YARN-3768.001.patch Index out of range exception with environment variables without values -- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Attachments: YARN-3768.000.patch, YARN-3768.001.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages
[ https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595717#comment-14595717 ] Hadoop QA commented on YARN-3826: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 43s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740355/YARN-3826.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8308/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8308/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8308/console | This message was automatically generated. Race condition in ResourceTrackerService: potential wrong diagnostics messages -- Key: YARN-3826 URL: https://issues.apache.org/jira/browse/YARN-3826 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3826.01.patch Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which can be called concurrently, the static {{resync}} and {{shutdown}} may have wrong diagnostics messages in some cases. On the other side, these static members can hardly save any memory, since the normal heartbeat responses are created for each heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595495#comment-14595495 ] zhihai xu commented on YARN-3768: - Hi [~xgong], thanks for the review. I uploaded a new patch YARN-3768.001.patch, in which I add a test case to verify bad environment variables are skipped. About keeping trailing empty strings, it will depend on whether an Environment Variable with empty value is a valid use case. MAPREDUCE-5965 adds option to configure Environment Variable with empty value if stream.jobconf.truncate.limit is 0. It looks like an Environment Variable with empty value may be a valid use case. Index out of range exception with environment variables without values -- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Attachments: YARN-3768.000.patch, YARN-3768.001.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595552#comment-14595552 ] Hadoop QA commented on YARN-3768: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 53s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 40m 4s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740968/YARN-3768.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8307/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8307/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8307/console | This message was automatically generated. Index out of range exception with environment variables without values -- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Attachments: YARN-3768.000.patch, YARN-3768.001.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3798: -- Attachment: YARN-3798-2.7.002.patch ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at
[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager
[ https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596630#comment-14596630 ] Hadoop QA commented on YARN-3360: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 29s | The applied patch generated 19 new checkstyle issues (total was 7, now 26). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 10s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 39m 36s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741115/YARN-3360.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8313/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8313/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8313/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8313/console | This message was automatically generated. Add JMX metrics to TimelineDataManager -- Key: YARN-3360 URL: https://issues.apache.org/jira/browse/YARN-3360 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Labels: BB2015-05-TBR Attachments: YARN-3360.001.patch, YARN-3360.002.patch, YARN-3360.003.patch The TimelineDataManager currently has no metrics, outside of the standard JVM metrics. It would be very useful to at least log basic counts of method calls, time spent in those calls, and number of entities/events involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596327#comment-14596327 ] Robert Kanter commented on YARN-3835: - +1 hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml Key: YARN-3835 URL: https://issues.apache.org/jira/browse/YARN-3835 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Vamsee Yarlagadda Assignee: Vamsee Yarlagadda Priority: Minor Attachments: YARN-3835.patch It looks like by default yarn is bundling core-site.xml, yarn-site.xml in test artifact of hadoop-yarn-server-resourcemanager which means that any downstream project which uses this a dependency can have a problem in picking up the user supplied/environment supplied core-site.xml, yarn-site.xml So we should ideally exclude these .xml files from being bundled into the test-jar. (Similar to YARN-1748) I also proactively looked at other YARN modules where this might be happening. {code} vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml ./hadoop-yarn/conf/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml {code} And out of these only two modules (hadoop-yarn-server-resourcemanager, hadoop-yarn-server-tests) are building test-jars. In future, if we start building test-jar of other modules, we should exclude these xml files from being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3635: - Attachment: YARN-3635.4.patch Sorry for my late response, [~vinodkv]. Just have some bandwidth to do the update. Attached ver.4 addressed most of your comments, now queue-placement-rules is a separated module in RM, and scheduler initializes it. RMAppManager uses it to do queue placing. Defined interfaces are not exactly as same as you suggested, I put minimal set of interfaces needed in my mind. You can take a look at: {{org.apache.hadoop.yarn.server.resourcemanager.placement}} for details. And the ver.4 patch makes original CapacityScheduler.QueueMapping becomes a rule: UserGroupPlacementRule. Thoughts? Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3835: Target Version/s: 2.8.0 hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml Key: YARN-3835 URL: https://issues.apache.org/jira/browse/YARN-3835 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Vamsee Yarlagadda Assignee: Vamsee Yarlagadda Priority: Minor Attachments: YARN-3835.patch It looks like by default yarn is bundling core-site.xml, yarn-site.xml in test artifact of hadoop-yarn-server-resourcemanager which means that any downstream project which uses this a dependency can have a problem in picking up the user supplied/environment supplied core-site.xml, yarn-site.xml So we should ideally exclude these .xml files from being bundled into the test-jar. (Similar to YARN-1748) I also proactively looked at other YARN modules where this might be happening. {code} vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml ./hadoop-yarn/conf/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml {code} And out of these only two modules (hadoop-yarn-server-resourcemanager, hadoop-yarn-server-tests) are building test-jars. In future, if we start building test-jar of other modules, we should exclude these xml files from being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596337#comment-14596337 ] Hadoop QA commented on YARN-2902: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 36s | The applied patch generated 25 new checkstyle issues (total was 168, now 187). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 24s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 43m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741076/YARN-2902.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8309/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8309/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8309/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8309/console | This message was automatically generated. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596481#comment-14596481 ] Jian He commented on YARN-1963: --- I think we need to move this forward.. Overall, I prefer using numeric priority to label-based priority because the former is simpler and more flexible if user wants to define a wide range of priorities. no extra configs. User does not need to be educated about the new mapping any time the mapping changes. Also, one problem is that if we refresh the priority mapping while some existing long-running jobs are already running on certain priority, how do we map the previous priority mapping range to the new priority mapping range? In addition, if everyone runs the application at “VERY_HIGH” priority, the “HIGH” priority, though named as “HIGH”, is not really the “HIGH” priority any more. It actually becomes the “LOWEST” priority. My point is that the importance of priority will make sense only when compared with its peers. In that sense, I think adding a utility to surface how applications are distributed across each priority so that user can reason about how to place the application on certain priority may be more useful than adding a static naming mapping to let people reason about the relative importance of priority by naming. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values
[ https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596443#comment-14596443 ] Gera Shegalov commented on YARN-3768: - Instead of executing two regexes: first directly via Pattern p = Pattern.compile(Shell.getEnvironmentVariableRegex()) and then via split can we simply match via a single regex? we can use a capture group to get the value. Index out of range exception with environment variables without values -- Key: YARN-3768 URL: https://issues.apache.org/jira/browse/YARN-3768 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.5.0 Reporter: Joe Ferner Assignee: zhihai xu Attachments: YARN-3768.000.patch, YARN-3768.001.patch Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range exception occurs if an environment variable is encountered without a value. I believe this occurs because java will not return empty strings from the split method. Similar to this http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596461#comment-14596461 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740206/YARN-3798-branch-2.7.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8310/console | This message was automatically generated. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596470#comment-14596470 ] Hadoop QA commented on YARN-3798: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741098/YARN-3798-2.7.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8312/console | This message was automatically generated. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01
[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler
[ https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596441#comment-14596441 ] Jian He commented on YARN-3790: --- lgtm, thanks [~zxu] and [~rohithsharma] TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler Key: YARN-3790 URL: https://issues.apache.org/jira/browse/YARN-3790 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, test Reporter: Rohith Sharma K S Assignee: zhihai xu Attachments: YARN-3790.000.patch Failure trace is as follows {noformat} Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) Time elapsed: 6.502 sec FAILURE! java.lang.AssertionError: expected:6144 but was:8192 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3820) Collect disks usages on the node
[ https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596451#comment-14596451 ] Inigo Goiri commented on YARN-3820: --- You may want to exclude the change in CommonNodeLabelsManager.java as it's not related to this patch. Collect disks usages on the node Key: YARN-3820 URL: https://issues.apache.org/jira/browse/YARN-3820 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Robert Grandl Assignee: Robert Grandl Labels: yarn-common, yarn-util Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, YARN-3820-4.patch In this JIRA we propose to collect disks usages on a node. This JIRA is part of a larger effort of monitoring resource usages on the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596437#comment-14596437 ] Giovanni Matteo Fumarola commented on YARN-3116: Thanks [~zjshen] for quickly reviewing the patch your comments. 1. I agree that ContainerTokenIdentifier would be a better place to do it so that we keep the flag internal but the ContainerTokenIdentifier is created before the state transition in RMAppAttempt that sets the AM flag in RMContainer. I can try to recreate ContainerTokenIdentifier at the AM launch but that looks unwieldy. Do you have any suggestions on how to do it cleaner? 2. Again a good observation, I'll add this in the next iteration of the patch based on your suggestion for (1) above. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596987#comment-14596987 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~zxu] In the case of SessionMovedException, I think zk client should retry to connect to another zk server with same session id automatically without creating new session. If we create new session for SessionMovedException, we'll face the same issue as Bibin and Varun reported. With new patch, SessionMovedException is handled in same session. After we get SessionMovedException, the zk client in ZKRMStateStore waits for passing specified period and retrying operations. At that time, zk server should detect the session has moved and close the client as a document for ZooKeeper mentions: http://zookeeper.apache.org/doc/r3.4.0/zookeeperProgrammers.html#ch_zkSessions {quote} When the delayed packet arrives at the first server, the old server detects that the session has moved, and closes the client connection. {quote} If this behaviour is not same as described, we should fix ZooKeeper. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596994#comment-14596994 ] Hadoop QA commented on YARN-3792: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 29s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 59s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 10s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 11s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 49s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 17s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 115m 20s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741171/YARN-3792-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 8c036a1 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8319/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8319/console | This message was automatically generated. Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, YARN-3792-YARN-2928.004.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression # Remove unused {{enableATSV1}} in testDisstributedShell # container metrics needs to be published only for v2 test cases of testDisstributedShell # Nullpointer was thrown in
[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596989#comment-14596989 ] Hadoop QA commented on YARN-3705: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 5m 41s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 50m 55s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 96m 54s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741178/YARN-3705.002.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / fac4e04 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8320/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8320/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8320/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8320/console | This message was automatically generated. forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Sub-task Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-3705.001.patch, YARN-3705.002.patch Executing {{rmadmin -transitionToStandby --forcemanual}} in automatic-failover.enabled mode makes ResouceManager standby while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin in order to enable other candidates to promote, otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3800: Attachment: YARN-3800.003.patch fixed checkstyle Simplify inmemory state for ReservationAllocation - Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3800.001.patch, YARN-3800.002.patch, YARN-3800.002.patch, YARN-3800.003.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-3705: --- Attachment: YARN-3705.003.patch The test failure is relevant. ResourceManager#handleTransitionToStandBy is expected to be used only when automatic failover enabled. I am attaching 003 addressing non automatic failover case too. forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Sub-task Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-3705.001.patch, YARN-3705.002.patch, YARN-3705.003.patch Executing {{rmadmin -transitionToStandby --forcemanual}} in automatic-failover.enabled mode makes ResouceManager standby while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin in order to enable other candidates to promote, otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596990#comment-14596990 ] Tsuyoshi Ozawa commented on YARN-3798: -- [~vinodkv] the patch is only applied to branch-2.7 because ZKRMStateStrore of 2.8 or later uses Apache Curator. I'm running test locally under hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager, so I'll report the result manually. Double checking is welcome. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at
[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED
[ https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596997#comment-14596997 ] Tsuyoshi Ozawa commented on YARN-3798: -- After zk server closes the client, zk client in ZKRMStateStore will accept CONNECTIONLOSS and handle it without creating new session. ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED --- Key: YARN-3798 URL: https://issues.apache.org/jira/browse/YARN-3798 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Assignee: Varun Saxena Priority: Blocker Attachments: RM.log, YARN-3798-2.7.002.patch, YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch RM going down with NoNode exception during create of znode for appattempt *Please find the exception logs* {code} 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2015-06-09 10:09:44,732 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2015-06-09 10:09:44,886 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-06-09 10:09:44,887 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed out ZK retries. Giving up! 2015-06-09 10:09:44,887 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating appAttempt: appattempt_1433764310492_7152_01 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926) at
[jira] [Updated] (YARN-3841) [Storage implementation] Create HDFS backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3841: - Summary: [Storage implementation] Create HDFS backing storage implementation for ATS writes (was: [Storage abstraction] Create HDFS backing storage implementation for ATS writes) [Storage implementation] Create HDFS backing storage implementation for ATS writes -- Key: YARN-3841 URL: https://issues.apache.org/jira/browse/YARN-3841 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting ATS design document of YARN-2928: {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597189#comment-14597189 ] Hadoop QA commented on YARN-3800: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 10m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 6s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 28s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 0s | The applied patch generated 1 new checkstyle issues (total was 54, now 49). | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 58s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 45m 33s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 94m 11s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741211/YARN-3800.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 99271b7 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8321/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8321/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8321/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8321/console | This message was automatically generated. Simplify inmemory state for ReservationAllocation - Key: YARN-3800 URL: https://issues.apache.org/jira/browse/YARN-3800 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3800.001.patch, YARN-3800.002.patch, YARN-3800.002.patch, YARN-3800.003.patch Instead of storing the ReservationRequest we store the Resource for allocations, as thats the only thing we need. Ultimately we convert everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597187#comment-14597187 ] Hadoop QA commented on YARN-3838: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 52s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 34s | The applied patch generated 1 new checkstyle issues (total was 39, now 40). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 2s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 66m 50s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740917/0001-YARN-3838.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 99271b7 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8322/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8322/console | This message was automatically generated. Rest API failing when ip configured in RM address in secure https mode -- Key: YARN-3838 URL: https://issues.apache.org/jira/browse/YARN-3838 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 0001-YARN-3838.patch, 0002-YARN-3810.patch Steps to reproduce === 1.Configure hadoop.http.authentication.kerberos.principal as below {code:xml} property namehadoop.http.authentication.kerberos.principal/name valueHTTP/_h...@hadoop.com/value /property {code} 2. In RM web address also configure IP 3. Startup RM Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP /ws/v1/cluster/info}} *Actual* Rest API failing {code} 2015-06-16 19:03:49,845 DEBUG org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) {code} -- This message was sent by
[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597176#comment-14597176 ] Masatake Iwasaki commented on YARN-3705: bq. ResourceManager#handleTransitionToStandBy is expected to be used only when automatic failover enabled. This was not true. It checks not {{isAutomaticFailoverEnabled}} but {{isHAEnabled}}. {{ResourceManager#handleTransitionToStandBy}} is no-op if {{RMContext#isHAEnabled}} is false. {code} public void handleTransitionToStandBy() { if (rmContext.isHAEnabled()) { try { // Transition to standby and reinit active services LOG.info(Transitioning RM to Standby mode); transitionToStandby(true); adminService.resetLeaderElection(); return; } catch (Exception e) { LOG.fatal(Failed to transition RM to Standby mode.); ExitUtil.terminate(1, e); } } } {code} It seems strange that doing nothing in transitionToStandby if {{isHAEnable}} is false affects tests for HA... forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Sub-task Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: YARN-3705.001.patch, YARN-3705.002.patch, YARN-3705.003.patch Executing {{rmadmin -transitionToStandby --forcemanual}} in automatic-failover.enabled mode makes ResouceManager standby while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin in order to enable other candidates to promote, otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3841) [Storage abstraction] Create HDFS backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3841: - Description: HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting ATS design document of YARN-2928: {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} was: HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} [Storage abstraction] Create HDFS backing storage implementation for ATS writes --- Key: YARN-3841 URL: https://issues.apache.org/jira/browse/YARN-3841 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting ATS design document of YARN-2928: {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3841) [Storage abstraction] Create HDFS backing storage implementation for ATS writes
Tsuyoshi Ozawa created YARN-3841: Summary: [Storage abstraction] Create HDFS backing storage implementation for ATS writes Key: YARN-3841 URL: https://issues.apache.org/jira/browse/YARN-3841 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa HDFS backing storage is useful for following scenarios. 1. For Hadoop clusters which don't run HBase. 2. For fallback from HBase when HBase cluster is temporary unavailable. {quote} In the case the HBase storage is not available, the plugin should buffer the writes temporarily (e.g. HDFS), and flush them once the storage comes back online. Reading and writing to hdfs as the the backup storage could potentially use the HDFS writer plugin unless the complexity of generalizing the HDFS writer plugin for this purpose exceeds the benefits of reusing it here. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595896#comment-14595896 ] LINTE commented on YARN-3840: - Hi, There is no Java stackstrace for this bug. I think that the property yarn.resourcemanager.max-completed-applications is in cause (default value is 1), but it doesn't work properly. Maybe yarn.resourcemanager.max-completed-applications is only effective on ResourceManager GUI. Regards, Resource Manager web ui bug on main view after application number -- Key: YARN-3840 URL: https://issues.apache.org/jira/browse/YARN-3840 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: Centos 6.6 Java 1.7 Reporter: LINTE On the WEBUI, the global main view page : http://resourcemanager:8088/cluster/apps doesn't display applications over . With command line it works (# yarn application -list). Regards, Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595971#comment-14595971 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2164 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2164/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Scrub debug logging of tokens during resource localization. --- Key: YARN-3834 URL: https://issues.apache.org/jira/browse/YARN-3834 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.8.0 Attachments: YARN-3834.001.patch During resource localization, the NodeManager logs tokens at debug level to aid troubleshooting. This includes the full token representation. Best practice is to avoid logging anything secret, even at debug level. We can improve on this by changing the logging to use a scrubbed representation of the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595984#comment-14595984 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #225 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/225/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java Scrub debug logging of tokens during resource localization. --- Key: YARN-3834 URL: https://issues.apache.org/jira/browse/YARN-3834 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.8.0 Attachments: YARN-3834.001.patch During resource localization, the NodeManager logs tokens at debug level to aid troubleshooting. This includes the full token representation. Best practice is to avoid logging anything secret, even at debug level. We can improve on this by changing the logging to use a scrubbed representation of the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)