[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596089#comment-14596089
 ] 

Hudson commented on YARN-3834:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2182 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2182/])
YARN-3834. Scrub debug logging of tokens during resource localization. 
Contributed by Chris Nauroth (xgong: rev 
6c7a9d502a633b5aca75c9798f19ce4a5729014e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Scrub debug logging of tokens during resource localization.
 ---

 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.8.0

 Attachments: YARN-3834.001.patch


 During resource localization, the NodeManager logs tokens at debug level to 
 aid troubleshooting.  This includes the full token representation.  Best 
 practice is to avoid logging anything secret, even at debug level.  We can 
 improve on this by changing the logging to use a scrubbed representation of 
 the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-06-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596173#comment-14596173
 ] 

Ted Yu commented on YARN-3815:
--

My comment is related to usage of hbase.
bq. under framework_specific_metrics column family
Since column family name appears in every KeyValue, it would be better to use 
very short column family name. e.g. f_m for framework metrics.

 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen on:
 - Application level, expect return: an application with aggregated stats
 - Flow level, expect return: aggregated stats for a flow_run, flow_version 
 and flow 
 - User level, expect return: aggregated stats for applications submitted by 
 user
 - Queue level, expect return: aggregated stats for applications within the 
 Queue
 Application states is the basic building block for all other level 
 aggregations. We can provide Flow/User/Queue level aggregated statistics info 
 based on application states (a dedicated table for application states is 
 needed which is missing from previous design documents like HBase/Phoenix 
 schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent

2015-06-22 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596257#comment-14596257
 ] 

Siqi Li commented on YARN-3176:
---

Hi [~djp], can you take a look at patch v2. The checkstyle issues and test 
errors do not seems to apply to this patch

 In Fair Scheduler, child queue should inherit maxApp from its parent
 

 Key: YARN-3176
 URL: https://issues.apache.org/jira/browse/YARN-3176
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-3176.v1.patch, YARN-3176.v2.patch


 if the child queue does not have a maxRunningApp limit, it will use the 
 queueMaxAppsDefault. This behavior is not quite right, since 
 queueMaxAppsDefault is normally a small number, whereas some parent queues do 
 have maxRunningApp set to be more than the default



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-06-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596129#comment-14596129
 ] 

Junping Du commented on YARN-3815:
--

Thanks [~sjlee0] and [~jrottinghuis] for review and good comments in detail. 
[~jrottinghuis]'s comments are pretty long and I could only reply part of it 
and will finish the left parts tomorrow. :)

bq. For framework-specific metrics, I would say this falls on the individual 
frameworks. The framework AM usually already aggregates them in memory 
(consider MR job counters for example). So for them it is straightforward to 
write them out directly onto the YARN app entities. Furthermore, it is 
problematic to add them to the sub-app YARN entities and ask YARN to aggregate 
them to the application. Framework’s sub-app entities may not even align with 
YARN’s sub-app entities. For example, in case of MR, there is a reasonable 
one-to-one mapping between a mapper/reducer task attempt and a container, but 
for other applications that may not be true. Forcing all frameworks to hang 
values at containers may not be practical. I think it’s far easier for 
frameworks to write aggregated values to the YARN app entities.
AM currently leverage YARN's AppTimelineCollector to forward entities to 
backend storage, so making AM talk directly to backend storage is not 
considered to be safe. It is also not necessary too because the real difficulty 
here is to aggregate framework specific metrics in other levels (flow, user and 
queue), because that beyond the life cycle of framework so YARN have to take 
care of it. Instead of asking frameworks to handle specific metrics themselves, 
I would like to propose to treat these metrics as anonymous, it would pass 
both metrics name and value to YARN's collector and YARN's collector could 
aggregate it and store as dynamic column (under framework_specific_metrics 
column family) into app states table. So other (flow, user, etc.) level 
aggregation on freamework metrics could happen based on this.

bq. app-to-flow online aggregation. This is more or less live aggregated 
metrics at the flow level. This will still be based on the native HBase schema.
About flow online aggregation, I am not quite sure on requirement yet. Do we 
really want real time for flow aggregated data or some fine-grained time 
interval (like 15 secs) should be good enough - if we want to show some nice 
metrics chart for flow, this should be fine. Even for real time, we don't have 
to aggregate everything from raw entity table, we don't have to duplicated 
count metrics again for finished apps. Isn't it?

bq. (3) time-based flow aggregation: This is different than the online 
aggregation in the sense that it is aggregated along the time boundary (e.g. 
“daily”, “weekly”, etc.). This can be based on the Phoenix schema. This can be 
populated in an offline fashion (e.g. running a mapreduce job).
Any special reason not to handle it in the same way above - as HBase 
coprocessor? It just sound like gross-grained time interval. Isn't it?

bq. This is another “offline” aggregation type. Also, I believe we’re talking 
about only time-based aggregation. In other words, we would aggregate values 
for users only with a well-defined time window. There won’t be a “real-time” 
aggregation of values, similar to the flow aggregation.
I would also call for a fine-grained time interval (closed to real-time) 
because the aggregated resource metrics on user could be used in billing hadoop 
usage in a shared environment (no matter private or public cloud), so user need 
to know more details on resource consumption especially in some random peak 
time.

bq. Very much agree with separation into 2 categories online versus 
periodic. I think this will be natural split between the native HBase tables 
for the former and the Phoenix approach for the latter to each emphasize their 
relative strengths.
I would question the necessary for online again if this mean real time 
instead of fine-grained time interval. Actually, as a building block, every 
container metrics (cpu, memory, etc.) are generated in a time interval instead 
of real time. As a result, we never know the exactly snapshot of whole system 
in a precisely time but only can try to getting closer.


 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen 

[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999

2015-06-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596230#comment-14596230
 ] 

Xuan Gong commented on YARN-3840:
-

[~Alexandre LINTE] Hey, could you provide which version of hadoop you are using 
? 2.7 ?

 Resource Manager web ui bug on main view after application number 
 --

 Key: YARN-3840
 URL: https://issues.apache.org/jira/browse/YARN-3840
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Centos 6.6
 Java 1.7
Reporter: LINTE

 On the WEBUI, the global main view page : 
 http://resourcemanager:8088/cluster/apps doesn't display applications over 
 .
 With command line it works (# yarn application -list).
 Regards,
 Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-06-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: YARN-2902.03.patch

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596059#comment-14596059
 ] 

Hudson commented on YARN-3834:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #234 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/234/])
YARN-3834. Scrub debug logging of tokens during resource localization. 
Contributed by Chris Nauroth (xgong: rev 
6c7a9d502a633b5aca75c9798f19ce4a5729014e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


 Scrub debug logging of tokens during resource localization.
 ---

 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.8.0

 Attachments: YARN-3834.001.patch


 During resource localization, the NodeManager logs tokens at debug level to 
 aid troubleshooting.  This includes the full token representation.  Best 
 practice is to avoid logging anything secret, even at debug level.  We can 
 improve on this by changing the logging to use a scrubbed representation of 
 the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3820) Collect disks usages on the node

2015-06-22 Thread Robert Grandl (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596463#comment-14596463
 ] 

Robert Grandl commented on YARN-3820:
-

[~elgoiri], I fixed the warning because HadoopQA javadoc was -1. I will revert 
the change if HadoopQA will return +1.

 Collect disks usages on the node
 

 Key: YARN-3820
 URL: https://issues.apache.org/jira/browse/YARN-3820
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, 
 YARN-3820-4.patch


 In this JIRA we propose to collect disks usages on a node. This JIRA is part 
 of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596522#comment-14596522
 ] 

Varun Saxena commented on YARN-3798:


Thanks [~ozawa]. Explanation given by you and subsequent discussions with 
[~rakeshr] helped a lot in clarifying behavior of zookeeper.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596606#comment-14596606
 ] 

zhihai xu commented on YARN-3798:
-

I think we should also create a new session for SessionMovedException.
We hit the SessionMovedException before, the following is the reason for the 
SessionMovedException we find:
# ZK client tried to connect to Leader L. Network was very slow, so before 
leader processed the request, client disconnected.
# Client then re-connected to Follower F reusing the same session ID. It was 
successful.
# The request in step 1 went into leader. Leader processed it and invalidated 
the connection created in step 2. But client didn't know the connection it used 
is invalidated.
# Client got SessionMovedException when it used the connection invalidated by 
leader for any ZooKeeper operation.

IMHO, the only way to recover from this error at RM side is to take 
SessionMovedException as SessionExpiredException, close current ZK client and 
create a new one.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 

[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment

2015-06-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2801:
-
Attachment: YARN-2801.2.patch

Hi [~Naganarasimha],
Thanks for your thoughtful review, comments for your suggestions:
2) There's no preemption related documentation in Apache Hadoop yet, I suggest 
to add this part after we have a preemption page.
10) They're what admin should specify. I prefer to not add default value here 
because default is always changing, which will be tracked by 
{{yarn-default.xml}}
12) Changed it, it should be percentage of resources on nodes with DEFAULT 
partition.
13) That's different, {{value/value}} and not specifed means inherit from 
parent
18) REST API is under development, I think we still need some time to finalize 
it for 2.8. I suggest to add that part later.
19) Added CS link from node labels page, I think it's a relative independent 
feature. I suggest to not reference from CS.

I addressed other items in attached patch.

Please let me know your ideas.

Thanks,

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan
 Attachments: YARN-2801.1.patch, YARN-2801.2.patch


 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager

2015-06-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596717#comment-14596717
 ] 

Jason Lowe commented on YARN-3360:
--

The checkstyle comments are complaining about existing method argument lengths 
or the visibility of the Metrics fields.  I was replicating the same style used 
by all other metric fields, so this is consistent with the code base.

 Add JMX metrics to TimelineDataManager
 --

 Key: YARN-3360
 URL: https://issues.apache.org/jira/browse/YARN-3360
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-3360.001.patch, YARN-3360.002.patch, 
 YARN-3360.003.patch


 The TimelineDataManager currently has no metrics, outside of the standard JVM 
 metrics.  It would be very useful to at least log basic counts of method 
 calls, time spent in those calls, and number of entities/events involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3360) Add JMX metrics to TimelineDataManager

2015-06-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3360:
-
Attachment: YARN-3360.003.patch

Rebased patch on trunk

 Add JMX metrics to TimelineDataManager
 --

 Key: YARN-3360
 URL: https://issues.apache.org/jira/browse/YARN-3360
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-3360.001.patch, YARN-3360.002.patch, 
 YARN-3360.003.patch


 The TimelineDataManager currently has no metrics, outside of the standard JVM 
 metrics.  It would be very useful to at least log basic counts of method 
 calls, time spent in those calls, and number of entities/events involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-110) AM releases too many containers due to the protocol

2015-06-22 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596614#comment-14596614
 ] 

Giovanni Matteo Fumarola commented on YARN-110:
---

[~acmurthy], [~vinodkv] any updates on this? 
If you don't mind, can I work on this?

 AM releases too many containers due to the protocol
 ---

 Key: YARN-110
 URL: https://issues.apache.org/jira/browse/YARN-110
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: YARN-110.patch


 - AM sends request asking 4 containers on host H1.
 - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at 
 this point, sets the value against H1 to
 zero in its aggregate request-table for all apps.
 - In the mean-while AM gets to need 3 more containers, so a total of 7 
 including the 4 from previous request.
 - Today, AM sends the absolute number of 7 against H1 to RM as part of its 
 request table.
 - RM seems to be overriding its earlier value of zero against H1 to 7 against 
 H1. And thus allocating 7 more
 containers.
 - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of 
 11 instead of the required 7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3842:

Attachment: YARN-3842.001.patch

That makes sense.  The patch is also a lot simpler; it just adds a retry policy 
for {{NMNotYetReadyException}}, and a test.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-22 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596663#comment-14596663
 ] 

Subru Krishnan commented on YARN-3800:
--

Thanks [~adhoot] for the patch. I looked at it  just had a couple of comments:
   1. Can we have _toResource(ReservationRequest request)_ in a Reservation 
utility class rather than in _InMemoryReservationAllocation_
   2. I feel we can update the constructor of _InMemoryReservationAllocation_ 
to take in _MapReservationInterval, Resource_ instead of 
_MapReservationInterval, ReservationRequest_ so that we do the translation 
only once. This should simplify the state in GreedyReservationAgent also.

 Simplify inmemory state for ReservationAllocation
 -

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3800.001.patch, YARN-3800.002.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations

2015-06-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596616#comment-14596616
 ] 

Ted Yu commented on YARN-3815:
--

bq. in the spirit of readless increments as used in Tephra

Readless increment feature is implemented in cdap, called delta write.
Please take a look at:
cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java
cdap-hbase-compat-0.98//src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java

The implementation uses hbase coprocessor, BTW

 [Aggregation] Application/Flow/User/Queue Level Aggregations
 

 Key: YARN-3815
 URL: https://issues.apache.org/jira/browse/YARN-3815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: Timeline Service Nextgen Flow, User, Queue Level 
 Aggregations (v1).pdf


 Per previous discussions in some design documents for YARN-2928, the basic 
 scenario is the query for stats can happen on:
 - Application level, expect return: an application with aggregated stats
 - Flow level, expect return: aggregated stats for a flow_run, flow_version 
 and flow 
 - User level, expect return: aggregated stats for applications submitted by 
 user
 - Queue level, expect return: aggregated stats for applications within the 
 Queue
 Application states is the basic building block for all other level 
 aggregations. We can provide Flow/User/Queue level aggregated statistics info 
 based on application states (a dedicated table for application states is 
 needed which is missing from previous design documents like HBase/Phoenix 
 schema design). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596620#comment-14596620
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 14s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 48s | The applied patch generated  2  
additional warning messages. |
| {color:red}-1{color} | release audit |   0m 18s | The applied patch generated 
4 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 47s | The applied patch generated  
18 new checkstyle issues (total was 204, now 215). |
| {color:red}-1{color} | whitespace |   0m  3s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 27s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  61m  8s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 39s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741096/YARN-3635.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/diffJavadocWarnings.txt
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8311/console |


This message was automatically generated.

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
 YARN-3635.4.patch


 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla moved MAPREDUCE-6409 to YARN-3842:
---

 Target Version/s: 2.7.1  (was: 2.7.1)
Affects Version/s: (was: 2.7.0)
   2.7.0
  Key: YARN-3842  (was: MAPREDUCE-6409)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596706#comment-14596706
 ] 

Hadoop QA commented on YARN-2801:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   3m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | site |   1m 58s | Site compilation is broken. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   5m 26s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741138/YARN-2801.2.patch |
| Optional Tests | site |
| git revision | trunk / 11ac848 |
| site | 
https://builds.apache.org/job/PreCommit-YARN-Build/8315/artifact/patchprocess/patchSiteWarnings.txt
 |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8315/console |


This message was automatically generated.

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan
 Attachments: YARN-2801.1.patch, YARN-2801.2.patch


 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon updated YARN-3843:

Attachment: YARN-3843.01.patch

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3843.01.patch


 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596867#comment-14596867
 ] 

Hadoop QA commented on YARN-3842:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 16s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 28s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m 27s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741154/YARN-3842.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 077250d |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/console |


This message was automatically generated.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-06-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596886#comment-14596886
 ] 

Naganarasimha G R commented on YARN-2801:
-

hi [~leftnoteasy], seems like after applying the patch mvn site is failing 

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan
 Attachments: YARN-2801.1.patch, YARN-2801.2.patch


 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3842:

Attachment: YARN-3842.002.patch

The new patch make the changes Karthik suggested.  I also added a few comments 
and renamed {{isExpectingNMNotYetReadyException}} to 
{{shouldThrowNMNotYetReadyException}} for clarity.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2015-06-22 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596837#comment-14596837
 ] 

Ming Ma commented on YARN-2862:
---

Thanks, [~rohithsharma] and [~leftnoteasy]. Yes, YARN-3410 will be useful. So 
admins still need to look through RM logs to identify those apps. Will it be 
useful to provide a new RM startup option to delete or skip such apps 
automatically?

 RM might not start if the machine was hard shutdown and 
 FileSystemRMStateStore was used
 ---

 Key: YARN-2862
 URL: https://issues.apache.org/jira/browse/YARN-2862
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma

 This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
 scenario, it might not be that important, unless there is something we need 
 to fix at RM layer to make it more tolerant to RMStore issue.
 When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
 of the stored application data end up with size zero after reboot. And RM 
 didn't like that.
 {noformat}
 ls -al 
 /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
 total 156
 drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
 drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 appattempt_1412702189634_324351_01
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 .appattempt_1412702189634_324351_01.crc
 -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
 -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
 {noformat}
 When RM starts up
 {noformat}
 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
 opening checksum file: 
 file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
   Ignoring exception:
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
 ...
 2014-11-13 17:40:48,876 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
 load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)
Dongwook Kwon created YARN-3843:
---

 Summary: Fair Scheduler should not accept apps with space keys as 
queue name
 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.4.0
Reporter: Dongwook Kwon
Priority: Minor


As YARN-461, since empty string queue name is not valid, queue name with space 
keys such as   ,should not be accepted either, also not as prefix nor 
postfix. 
e.g) root.test.queuename  , or root.test. queuename

I have 2 specific cases kill RM with these space keys as part of queue name.
1) Without placement policy (hadoop 2.4.0 and above), 
When a job is submitted with  (space key) as queue name
e.g) mapreduce.job.queuename= 

2) With placement policy (hadoop 2.5.0 and above)
 Once a job is submitted without space key as queue name, and submit another 
job with space key.
e.g) 1st time: mapreduce.job.queuename=root.test.user1 
2nd time: mapreduce.job.queuename=root.test.user1 

{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
  Time elapsed: 0.724 sec   ERROR!
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2

2015-06-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596925#comment-14596925
 ] 

Sangjin Lee commented on YARN-3792:
---

The latest patch LGTM. Once the jenkins comes back, I'll go ahead and merge it.

Folks, do let me know soon if you have any other feedback. Thanks!

 Test case failures in TestDistributedShell and some issue fixes related to 
 ATSV2
 

 Key: YARN-3792
 URL: https://issues.apache.org/jira/browse/YARN-3792
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3792-YARN-2928.001.patch, 
 YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, 
 YARN-3792-YARN-2928.004.patch


 # encountered [testcase 
 failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
 which was happening even without the patch modifications in YARN-3044
 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
 TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
 TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression
 # Remove unused {{enableATSV1}} in testDisstributedShell
 # container metrics needs to be published only for v2 test cases of 
 testDisstributedShell
 # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux 
 service was not configured and {{TimelineClient.putObjects}} was getting 
 invoked.
 # Race condition for the Application events to published and test case 
 verification for RM's ApplicationFinished Timeline Events
 # Application Tags for converted to lowercase in 
 ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to 
 detect to custom flow details of the app



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3842) NMProxy should retry on NMNotYetReadyException

2015-06-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3842:
---
Summary: NMProxy should retry on NMNotYetReadyException  (was: NM restarts 
could lead to app failures)

 NMProxy should retry on NMNotYetReadyException
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2

2015-06-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596730#comment-14596730
 ] 

Sangjin Lee commented on YARN-3792:
---

Thanks [~Naganarasimha] for the update!

+1 on the test failure. It appears to be an issue unrelated to the timeline 
service.

It does seem like the whitespace is related to the patch (or in the vicinity of 
the patch). Could you kindly do a quick change to remove those extra spaces?

Also, for findbugs, I ran findbugs against those two projects (distributed 
shell and resource manager). I do see several findbugs warnings, and they are 
not introduced by this patch but do appear to be related to the YARN-2928 work.

distributed shell:
{code}
file 
classname='org.apache.hadoop.yarn.applications.distributedshell.Client'BugInstance
 type='DM_BOXED_PRIMITIVE_FOR_PARSING' priority='High' category='PERFORMANCE' 
message='Boxing/unboxing to parse a primitive 
org.apache.hadoop.yarn.applications.distributedshell.Client.init(String[])' 
lineNumber='466'//file
{code}

resource manager:
{code}
file 
classname='org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher'BugInstance
 type='BC_UNCONFIRMED_CAST' priority='Normal' category='STYLE' 
message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptFinishedEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='79'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' 
category='STYLE' message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AppAttemptRegisteredEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='76'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' 
category='STYLE' message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationACLsUpdatedEvent
 in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='73'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' 
category='STYLE' message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationCreatedEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='67'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' 
category='STYLE' message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ApplicationFinishedEvent 
in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='70'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' 
category='STYLE' message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerCreatedEvent in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='82'/BugInstance type='BC_UNCONFIRMED_CAST' priority='Normal' 
category='STYLE' message='Unchecked/unconfirmed cast from 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsEvent to 
org.apache.hadoop.yarn.server.resourcemanager.metrics.ContainerFinishedEvent in 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(SystemMetricsEvent)'
 lineNumber='85'//file{code}

It would be nice to address them (at least the one on Client.java) here, but if 
you're not inclined, we could do it later... Let me know how you want to 
proceed.

 Test case failures in TestDistributedShell and some issue fixes related to 
 ATSV2
 

 Key: YARN-3792
 URL: https://issues.apache.org/jira/browse/YARN-3792
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3792-YARN-2928.001.patch, 
 YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch


 # encountered [testcase 
 failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
 which was happening even 

[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-06-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3635:
-
Attachment: YARN-3635.5.patch

Attached ver.5, fixed bunch of warnings.

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
 YARN-3635.4.patch, YARN-3635.5.patch


 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596776#comment-14596776
 ] 

Jian He commented on YARN-3842:
---

I think the latest patch is safe for 2.7.1,  +1

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596783#comment-14596783
 ] 

Karthik Kambatla commented on YARN-3842:


+1, pending Jenkins. 

Thanks for your review, [~jianhe]. I ll go ahead commit this if Jenkins is fine 
with it. 

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596842#comment-14596842
 ] 

Dongwook Kwon commented on YARN-3843:
-

From my investigation, QueueMetrics doesn't allow space key string as start or 
end of names, it just trims empty strings.

static final Splitter Q_SPLITTER = 
Splitter.on('.').omitEmptyStrings().trimResults();

https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L112
https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L85

So, from FairScheduler, root.adhoc.birvine , this queue name with the space 
at the end of name, it is treated as different from root.adhoc.birvine 
because it has one more character, and from QueueMetrics, because names are 
trimmed, all of sudden, 2 different queue names become the same that causes the 
error as Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already 
exists!


 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor

 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3748) Cleanup Findbugs volatile warnings

2015-06-22 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596936#comment-14596936
 ] 

Gabor Liptak commented on YARN-3748:


Any other changes needed before this can be considered for commit? Thanks

 Cleanup Findbugs volatile warnings
 --

 Key: YARN-3748
 URL: https://issues.apache.org/jira/browse/YARN-3748
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Gabor Liptak
Priority: Minor
 Attachments: YARN-3748.1.patch, YARN-3748.2.patch, YARN-3748.3.patch, 
 YARN-3748.5.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596963#comment-14596963
 ] 

Hudson commented on YARN-3835:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8051 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8051/])
YARN-3835. hadoop-yarn-server-resourcemanager test package bundles 
core-site.xml, yarn-site.xml (vamsee via rkanter) (rkanter: rev 
99271b762129d78c86f3c9733a24c77962b0b3f7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


 hadoop-yarn-server-resourcemanager test package bundles core-site.xml, 
 yarn-site.xml
 

 Key: YARN-3835
 URL: https://issues.apache.org/jira/browse/YARN-3835
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Vamsee Yarlagadda
Assignee: Vamsee Yarlagadda
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3835.patch


 It looks like by default yarn is bundling core-site.xml, yarn-site.xml in 
 test artifact of hadoop-yarn-server-resourcemanager which means that any 
 downstream project which uses this a dependency can have a problem in picking 
 up the user supplied/environment supplied core-site.xml, yarn-site.xml
 So we should ideally exclude these .xml files from being bundled into the 
 test-jar. (Similar to YARN-1748)
 I also proactively looked at other YARN modules where this might be 
 happening. 
 {code}
 vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml
 ./hadoop-yarn/conf/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml
 {code}
 And out of these only two modules (hadoop-yarn-server-resourcemanager, 
 hadoop-yarn-server-tests) are building test-jars. In future, if we start 
 building test-jar of other modules, we should exclude these xml files from 
 being bundled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596733#comment-14596733
 ] 

Hadoop QA commented on YARN-3842:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 30s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m  5s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741131/YARN-3842.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/console |


This message was automatically generated.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596858#comment-14596858
 ] 

Dongwook Kwon commented on YARN-3843:
-

Thanks, you're right, it's duplicated. I didn't find the other jira case, I 
will close it.

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3843.01.patch


 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2801) Documentation development for Node labels requirment

2015-06-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2801:

Assignee: Wangda Tan  (was: Naganarasimha G R)

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan
 Attachments: YARN-2801.1.patch, YARN-2801.2.patch


 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2801) Documentation development for Node labels requirment

2015-06-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-2801:
---

Assignee: Naganarasimha G R  (was: Wangda Tan)

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Naganarasimha G R
 Attachments: YARN-2801.1.patch, YARN-2801.2.patch


 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NMProxy should retry on NMNotYetReadyException

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596946#comment-14596946
 ] 

Hudson commented on YARN-3842:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8050 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8050/])
YARN-3842. NMProxy should retry on NMNotYetReadyException. (Robert Kanter via 
kasha) (kasha: rev 5ebf2817e58e1be8214dc1916a694a912075aa0a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* hadoop-yarn-project/CHANGES.txt


 NMProxy should retry on NMNotYetReadyException
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596957#comment-14596957
 ] 

Hadoop QA commented on YARN-3800:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  7 
new checkstyle issues (total was 55, now 56). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m  0s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741165/YARN-3800.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fac4e04 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8318/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8318/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8318/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8318/console |


This message was automatically generated.

 Simplify inmemory state for ReservationAllocation
 -

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
 YARN-3800.002.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596756#comment-14596756
 ] 

Karthik Kambatla commented on YARN-3842:


Thanks for the quick turnaround on this, Robert. 

One nit-pick on the test: would the following be more concise? 

{code}
if (retryCount  5) {
  retryCount++;
  if (isExpectingNMNotYetReadyException) {
containerManager.setBlockNewContainerRequests(true);
  } else {
throw new java.net.ConnectException(start container exception);
  }
} else {
  containerManager.setBlockNewContainerRequests(false);
}
return super.startContainers(requests);
{code}

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596848#comment-14596848
 ] 

zhihai xu commented on YARN-3843:
-

Hi [~dongwook], thanks for reporting this issue. I think this issue was fixed 
at YARN-3241.

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor

 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2

2015-06-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3792:

Attachment: YARN-3792-YARN-2928.004.patch

Hi [~sjlee0], 
Corrected whitespace and  findbugs issue in Client.java attaching a patch for 
it, and the remaining seems to be not a problemif not unnecessary checks needs 
to be done.

 Test case failures in TestDistributedShell and some issue fixes related to 
 ATSV2
 

 Key: YARN-3792
 URL: https://issues.apache.org/jira/browse/YARN-3792
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3792-YARN-2928.001.patch, 
 YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, 
 YARN-3792-YARN-2928.004.patch


 # encountered [testcase 
 failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
 which was happening even without the patch modifications in YARN-3044
 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
 TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
 TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression
 # Remove unused {{enableATSV1}} in testDisstributedShell
 # container metrics needs to be published only for v2 test cases of 
 testDisstributedShell
 # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux 
 service was not configured and {{TimelineClient.putObjects}} was getting 
 invoked.
 # Race condition for the Application events to published and test case 
 verification for RM's ApplicationFinished Timeline Events
 # Application Tags for converted to lowercase in 
 ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to 
 detect to custom flow details of the app



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596894#comment-14596894
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 46s | The applied patch generated  
18 new checkstyle issues (total was 204, now 215). |
| {color:red}-1{color} | whitespace |   0m  3s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 30s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  50m 14s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 34s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741145/YARN-3635.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 077250d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8316/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8316/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8316/console |


This message was automatically generated.

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
 YARN-3635.4.patch, YARN-3635.5.patch


 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state

2015-06-22 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-3705:
---
Attachment: YARN-3705.002.patch

I'm attached 002 addressing whitespace warnings. TestWorkPreservingRMRestart is 
not related to the code path the patch fixes.

 forcemanual transitionToStandby in RM-HA automatic-failover mode should 
 change elector state
 

 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: YARN-3705.001.patch, YARN-3705.002.patch


 Executing {{rmadmin -transitionToStandby --forcemanual}} in 
 automatic-failover.enabled mode makes ResouceManager standby while keeping 
 the state of ActiveStandbyElector. It should make elector to quit and rejoin 
 in order to enable other candidates to promote, otherwise forcemanual 
 transition should not be allowed in automatic-failover mode in order to avoid 
 confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596765#comment-14596765
 ] 

Robert Kanter commented on YARN-3842:
-

I had sort of just split {{startContainers}} into two sections (one for each 
part of the test), but this is a lot more concise.  I'll do that.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3800:

Attachment: YARN-3800.002.patch

Addressed feedback

 Simplify inmemory state for ReservationAllocation
 -

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
 YARN-3800.002.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon resolved YARN-3843.
-
  Resolution: Duplicate
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3843.01.patch


 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2801) Documentation development for Node labels requirment

2015-06-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596910#comment-14596910
 ] 

Naganarasimha G R commented on YARN-2801:
-

Hi [~leftnoteasy],
After escaping the links, seems like its getting applied. Few nits :
* ??User need configure how many resources?? = {{User need configure how much 
resource  of each partition}}
* points in note after configuration section needs to come as list
* ??application can use following Java APIs?? =  ??Application can use 
following Java APIs??

Apart from it others seems to be fine !

 Documentation development for Node labels requirment
 

 Key: YARN-2801
 URL: https://issues.apache.org/jira/browse/YARN-2801
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Gururaj Shetty
Assignee: Wangda Tan
 Attachments: YARN-2801.1.patch, YARN-2801.2.patch


 Documentation needs to be developed for the node label requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2015-06-22 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595423#comment-14595423
 ] 

Hui Zheng commented on YARN-3001:
-

The only non-INFO log is following(it is so sudden there is not any other WARN 
or ERROR ).
There are several tens thousands of jobs per day.
{code}
2015-06-21 09:53:44,696 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.ArithmeticException: / by zero
at 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1335)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignNodeLocalContainers(LeafQueue.java:1185)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(
ResourceManager.java:557)at java.lang.Thread.run(Thread.java:724)
2015-06-21 09:53:44,696 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

 RM dies because of divide by zero
 -

 Key: YARN-3001
 URL: https://issues.apache.org/jira/browse/YARN-3001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: hoelog
Assignee: Rohith Sharma K S

 RM dies because of divide by zero exception.
 {code}
 2014-12-31 21:27:05,022 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
 at java.lang.Thread.run(Thread.java:745)
 2014-12-31 21:27:05,023 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595749#comment-14595749
 ] 

Hudson commented on YARN-3834:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #236 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/236/])
YARN-3834. Scrub debug logging of tokens during resource localization. 
Contributed by Chris Nauroth (xgong: rev 
6c7a9d502a633b5aca75c9798f19ce4a5729014e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


 Scrub debug logging of tokens during resource localization.
 ---

 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.8.0

 Attachments: YARN-3834.001.patch


 During resource localization, the NodeManager logs tokens at debug level to 
 aid troubleshooting.  This includes the full token representation.  Best 
 practice is to avoid logging anything secret, even at debug level.  We can 
 improve on this by changing the logging to use a scrubbed representation of 
 the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3840) Resource Manager web ui bug on main view after application number 9999

2015-06-22 Thread LINTE (JIRA)
LINTE created YARN-3840:
---

 Summary: Resource Manager web ui bug on main view after 
application number 
 Key: YARN-3840
 URL: https://issues.apache.org/jira/browse/YARN-3840
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Centos 6.6
Java 1.7
Reporter: LINTE


On the WEBUI, the global main view page : 
http://resourcemanager:8088/cluster/apps doesn't display applications over .

With command line it works (# yarn application -list).

Regards,

Alexandre





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595805#comment-14595805
 ] 

Hudson commented on YARN-3834:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #966 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/966/])
YARN-3834. Scrub debug logging of tokens during resource localization. 
Contributed by Chris Nauroth (xgong: rev 
6c7a9d502a633b5aca75c9798f19ce4a5729014e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


 Scrub debug logging of tokens during resource localization.
 ---

 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.8.0

 Attachments: YARN-3834.001.patch


 During resource localization, the NodeManager logs tokens at debug level to 
 aid troubleshooting.  This includes the full token representation.  Best 
 practice is to avoid logging anything secret, even at debug level.  We can 
 improve on this by changing the logging to use a scrubbed representation of 
 the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999

2015-06-22 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595826#comment-14595826
 ] 

Devaraj K commented on YARN-3840:
-

Thanks [~Alexandre LINTE] for reporting the issue. 

Can you paste the exception if you see anything in the RM UI or in the RM logs?

 Resource Manager web ui bug on main view after application number 
 --

 Key: YARN-3840
 URL: https://issues.apache.org/jira/browse/YARN-3840
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Centos 6.6
 Java 1.7
Reporter: LINTE

 On the WEBUI, the global main view page : 
 http://resourcemanager:8088/cluster/apps doesn't display applications over 
 .
 With command line it works (# yarn application -list).
 Regards,
 Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595457#comment-14595457
 ] 

Hadoop QA commented on YARN-3826:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 42s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 28s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 40s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 44s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  61m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 106m 18s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart 
|
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740355/YARN-3826.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c7a9d5 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8306/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8306/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8306/console |


This message was automatically generated.

 Race condition in ResourceTrackerService: potential wrong diagnostics messages
 --

 Key: YARN-3826
 URL: https://issues.apache.org/jira/browse/YARN-3826
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3826.01.patch


 Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which 
 can be called concurrently, the static {{resync}} and {{shutdown}} may have 
 wrong diagnostics messages in some cases.
 On the other side, these static members can hardly save any memory, since the 
 normal heartbeat responses are created for each heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3768) Index out of range exception with environment variables without values

2015-06-22 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3768:

Attachment: YARN-3768.001.patch

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner
Assignee: zhihai xu
 Attachments: YARN-3768.000.patch, YARN-3768.001.patch


 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595717#comment-14595717
 ] 

Hadoop QA commented on YARN-3826:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  50m 43s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740355/YARN-3826.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8308/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8308/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8308/console |


This message was automatically generated.

 Race condition in ResourceTrackerService: potential wrong diagnostics messages
 --

 Key: YARN-3826
 URL: https://issues.apache.org/jira/browse/YARN-3826
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3826.01.patch


 Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which 
 can be called concurrently, the static {{resync}} and {{shutdown}} may have 
 wrong diagnostics messages in some cases.
 On the other side, these static members can hardly save any memory, since the 
 normal heartbeat responses are created for each heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595495#comment-14595495
 ] 

zhihai xu commented on YARN-3768:
-

Hi [~xgong], thanks for the review. I uploaded a new patch YARN-3768.001.patch, 
in which I add a test case to verify bad environment variables are skipped.
About keeping trailing empty strings, it will depend on whether an Environment 
Variable with empty value is a valid use case.
MAPREDUCE-5965 adds option to configure Environment Variable with empty value 
if stream.jobconf.truncate.limit is 0.
It looks like an Environment Variable with empty value may be a valid use case.

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner
Assignee: zhihai xu
 Attachments: YARN-3768.000.patch, YARN-3768.001.patch


 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595552#comment-14595552
 ] 

Hadoop QA commented on YARN-3768:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 53s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  40m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740968/YARN-3768.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8307/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8307/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8307/console |


This message was automatically generated.

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner
Assignee: zhihai xu
 Attachments: YARN-3768.000.patch, YARN-3768.001.patch


 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3798:
--
Attachment: YARN-3798-2.7.002.patch

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 

[jira] [Commented] (YARN-3360) Add JMX metrics to TimelineDataManager

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596630#comment-14596630
 ] 

Hadoop QA commented on YARN-3360:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 29s | The applied patch generated  
19 new checkstyle issues (total was 7, now 26). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 59s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 10s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  39m 36s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741115/YARN-3360.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8313/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8313/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8313/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8313/console |


This message was automatically generated.

 Add JMX metrics to TimelineDataManager
 --

 Key: YARN-3360
 URL: https://issues.apache.org/jira/browse/YARN-3360
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: BB2015-05-TBR
 Attachments: YARN-3360.001.patch, YARN-3360.002.patch, 
 YARN-3360.003.patch


 The TimelineDataManager currently has no metrics, outside of the standard JVM 
 metrics.  It would be very useful to at least log basic counts of method 
 calls, time spent in those calls, and number of entities/events involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml

2015-06-22 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596327#comment-14596327
 ] 

Robert Kanter commented on YARN-3835:
-

+1

 hadoop-yarn-server-resourcemanager test package bundles core-site.xml, 
 yarn-site.xml
 

 Key: YARN-3835
 URL: https://issues.apache.org/jira/browse/YARN-3835
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Vamsee Yarlagadda
Assignee: Vamsee Yarlagadda
Priority: Minor
 Attachments: YARN-3835.patch


 It looks like by default yarn is bundling core-site.xml, yarn-site.xml in 
 test artifact of hadoop-yarn-server-resourcemanager which means that any 
 downstream project which uses this a dependency can have a problem in picking 
 up the user supplied/environment supplied core-site.xml, yarn-site.xml
 So we should ideally exclude these .xml files from being bundled into the 
 test-jar. (Similar to YARN-1748)
 I also proactively looked at other YARN modules where this might be 
 happening. 
 {code}
 vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml
 ./hadoop-yarn/conf/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml
 {code}
 And out of these only two modules (hadoop-yarn-server-resourcemanager, 
 hadoop-yarn-server-tests) are building test-jars. In future, if we start 
 building test-jar of other modules, we should exclude these xml files from 
 being bundled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-06-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3635:
-
Attachment: YARN-3635.4.patch

Sorry for my late response, [~vinodkv]. Just have some bandwidth to do the 
update.

Attached ver.4 addressed most of your comments, now queue-placement-rules is a 
separated module in RM, and scheduler initializes it. RMAppManager uses it to 
do queue placing.

Defined interfaces are not exactly as same as you suggested, I put minimal set 
of interfaces needed in my mind. You can take a look at: 
{{org.apache.hadoop.yarn.server.resourcemanager.placement}} for details.

And the ver.4 patch makes original CapacityScheduler.QueueMapping becomes a 
rule: UserGroupPlacementRule.

Thoughts?

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
 YARN-3635.4.patch


 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml

2015-06-22 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-3835:

Target Version/s: 2.8.0

 hadoop-yarn-server-resourcemanager test package bundles core-site.xml, 
 yarn-site.xml
 

 Key: YARN-3835
 URL: https://issues.apache.org/jira/browse/YARN-3835
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Vamsee Yarlagadda
Assignee: Vamsee Yarlagadda
Priority: Minor
 Attachments: YARN-3835.patch


 It looks like by default yarn is bundling core-site.xml, yarn-site.xml in 
 test artifact of hadoop-yarn-server-resourcemanager which means that any 
 downstream project which uses this a dependency can have a problem in picking 
 up the user supplied/environment supplied core-site.xml, yarn-site.xml
 So we should ideally exclude these .xml files from being bundled into the 
 test-jar. (Similar to YARN-1748)
 I also proactively looked at other YARN modules where this might be 
 happening. 
 {code}
 vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml
 ./hadoop-yarn/conf/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml
 ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml
 {code}
 And out of these only two modules (hadoop-yarn-server-resourcemanager, 
 hadoop-yarn-server-tests) are building test-jars. In future, if we start 
 building test-jar of other modules, we should exclude these xml files from 
 being bundled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596337#comment-14596337
 ] 

Hadoop QA commented on YARN-2902:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 36s | The applied patch generated  
25 new checkstyle issues (total was 168, now 187). |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 24s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  43m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741076/YARN-2902.03.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8309/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8309/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8309/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8309/console |


This message was automatically generated.

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Attachments: YARN-2902.002.patch, YARN-2902.03.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-06-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596481#comment-14596481
 ] 

Jian He commented on YARN-1963:
---

I think we need to move this forward..

Overall, I prefer using numeric priority to label-based priority because the 
former is simpler and more flexible if user wants to define a wide range of 
priorities. no extra configs. User does not need to be educated about the new 
mapping any time the mapping changes.

Also, one problem is that if we refresh the priority mapping while some 
existing long-running jobs are already running on certain priority, how do we 
map the previous priority mapping range to the new priority mapping range?

In addition, if everyone runs the application at “VERY_HIGH” priority, the 
“HIGH” priority, though named as “HIGH”, is not really the “HIGH” priority any 
more. It actually becomes the “LOWEST” priority. My point is that the 
importance of priority will make sense only when compared with its peers. In 
that sense, I think adding a utility to surface how applications are 
distributed across each priority so that user can reason about how to place the 
application on certain priority may be more useful than adding a static naming 
mapping to let people reason about the relative importance of priority by 
naming. 

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
 Priorities Design.pdf, YARN Application Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-22 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596443#comment-14596443
 ] 

Gera Shegalov commented on YARN-3768:
-

Instead of executing two regexes:

first directly via Pattern p = 
Pattern.compile(Shell.getEnvironmentVariableRegex()) and then via split

can we simply match via a single regex? we can use a capture group to get the 
value.

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner
Assignee: zhihai xu
 Attachments: YARN-3768.000.patch, YARN-3768.001.patch


 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596461#comment-14596461
 ] 

Hadoop QA commented on YARN-3798:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740206/YARN-3798-branch-2.7.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8310/console |


This message was automatically generated.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-branch-2.7.002.patch, 
 YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596470#comment-14596470
 ] 

Hadoop QA commented on YARN-3798:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741098/YARN-3798-2.7.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8312/console |


This message was automatically generated.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 

[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler

2015-06-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596441#comment-14596441
 ] 

Jian He commented on YARN-3790:
---

lgtm, thanks [~zxu] and [~rohithsharma]

 TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in 
 trunk for FS scheduler
 

 Key: YARN-3790
 URL: https://issues.apache.org/jira/browse/YARN-3790
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, test
Reporter: Rohith Sharma K S
Assignee: zhihai xu
 Attachments: YARN-3790.000.patch


 Failure trace is as follows
 {noformat}
 Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
 testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart)
   Time elapsed: 6.502 sec   FAILURE!
 java.lang.AssertionError: expected:6144 but was:8192
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3820) Collect disks usages on the node

2015-06-22 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596451#comment-14596451
 ] 

Inigo Goiri commented on YARN-3820:
---

You may want to exclude the change in CommonNodeLabelsManager.java as it's not 
related to this patch.

 Collect disks usages on the node
 

 Key: YARN-3820
 URL: https://issues.apache.org/jira/browse/YARN-3820
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Robert Grandl
Assignee: Robert Grandl
  Labels: yarn-common, yarn-util
 Attachments: YARN-3820-1.patch, YARN-3820-2.patch, YARN-3820-3.patch, 
 YARN-3820-4.patch


 In this JIRA we propose to collect disks usages on a node. This JIRA is part 
 of a larger effort of monitoring resource usages on the nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-06-22 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596437#comment-14596437
 ] 

Giovanni Matteo Fumarola commented on YARN-3116:


Thanks [~zjshen] for quickly reviewing the patch  your comments.

   1. I agree that ContainerTokenIdentifier would be a better place to do it so 
that we keep the flag internal but the ContainerTokenIdentifier is created 
before the state transition in RMAppAttempt that sets the AM flag in 
RMContainer. I can try to recreate ContainerTokenIdentifier at the AM launch 
but that looks unwieldy. Do you have any suggestions on how to do it cleaner?

   2. Again a good observation, I'll add this in the next iteration of the 
patch based on your suggestion for (1) above.

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Giovanni Matteo Fumarola
 Attachments: YARN-3116.patch


 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596987#comment-14596987
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~zxu] In the case of SessionMovedException, I think zk client should retry to 
connect to another zk server with same session id automatically without 
creating new session. If we create new session for SessionMovedException, we'll 
face the same issue as Bibin and Varun reported. With new patch, 
SessionMovedException is handled in same session. After we get 
SessionMovedException, the zk client in ZKRMStateStore waits for passing 
specified period and retrying operations. At that time, zk server should detect 
the session has moved and close the client 
as a document for ZooKeeper mentions: 
http://zookeeper.apache.org/doc/r3.4.0/zookeeperProgrammers.html#ch_zkSessions
{quote}
When the delayed packet arrives at the first server, the old server detects 
that the session has moved, and closes the client connection.
{quote}

If this behaviour is not same as described, we should fix ZooKeeper.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at 

[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596994#comment-14596994
 ] 

Hadoop QA commented on YARN-3792:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 29s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 59s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   8m 10s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m 11s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  51m 49s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 17s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | | 115m 20s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741171/YARN-3792-YARN-2928.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 8c036a1 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8319/console |


This message was automatically generated.

 Test case failures in TestDistributedShell and some issue fixes related to 
 ATSV2
 

 Key: YARN-3792
 URL: https://issues.apache.org/jira/browse/YARN-3792
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3792-YARN-2928.001.patch, 
 YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch, 
 YARN-3792-YARN-2928.004.patch


 # encountered [testcase 
 failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] 
 which was happening even without the patch modifications in YARN-3044
 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow
 TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow
 TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression
 # Remove unused {{enableATSV1}} in testDisstributedShell
 # container metrics needs to be published only for v2 test cases of 
 testDisstributedShell
 # Nullpointer was thrown in 

[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596989#comment-14596989
 ] 

Hadoop QA commented on YARN-3705:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   5m 41s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |  50m 55s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 54s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741178/YARN-3705.002.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / fac4e04 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8320/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8320/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8320/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8320/console |


This message was automatically generated.

 forcemanual transitionToStandby in RM-HA automatic-failover mode should 
 change elector state
 

 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: YARN-3705.001.patch, YARN-3705.002.patch


 Executing {{rmadmin -transitionToStandby --forcemanual}} in 
 automatic-failover.enabled mode makes ResouceManager standby while keeping 
 the state of ActiveStandbyElector. It should make elector to quit and rejoin 
 in order to enable other candidates to promote, otherwise forcemanual 
 transition should not be allowed in automatic-failover mode in order to avoid 
 confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-22 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3800:

Attachment: YARN-3800.003.patch

fixed checkstyle

 Simplify inmemory state for ReservationAllocation
 -

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
 YARN-3800.002.patch, YARN-3800.003.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state

2015-06-22 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-3705:
---
Attachment: YARN-3705.003.patch

The test failure is relevant. ResourceManager#handleTransitionToStandBy is 
expected to be used only when automatic failover enabled. I am attaching 003 
addressing non automatic failover case too. 

 forcemanual transitionToStandby in RM-HA automatic-failover mode should 
 change elector state
 

 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: YARN-3705.001.patch, YARN-3705.002.patch, 
 YARN-3705.003.patch


 Executing {{rmadmin -transitionToStandby --forcemanual}} in 
 automatic-failover.enabled mode makes ResouceManager standby while keeping 
 the state of ActiveStandbyElector. It should make elector to quit and rejoin 
 in order to enable other candidates to promote, otherwise forcemanual 
 transition should not be allowed in automatic-failover mode in order to avoid 
 confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596990#comment-14596990
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~vinodkv] the patch is only applied to branch-2.7 because ZKRMStateStrore of 
2.8 or later uses Apache Curator. I'm running test locally under 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager,
 so I'll report the result manually. Double checking is welcome.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at 

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-06-22 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596997#comment-14596997
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

After zk server closes the client, zk client in ZKRMStateStore will accept 
CONNECTIONLOSS and handle it without creating new session.

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 

[jira] [Updated] (YARN-3841) [Storage implementation] Create HDFS backing storage implementation for ATS writes

2015-06-22 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3841:
-
Summary: [Storage implementation] Create HDFS backing storage 
implementation for ATS writes  (was: [Storage abstraction] Create HDFS backing 
storage implementation for ATS writes)

 [Storage implementation] Create HDFS backing storage implementation for ATS 
 writes
 --

 Key: YARN-3841
 URL: https://issues.apache.org/jira/browse/YARN-3841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa

 HDFS backing storage is useful for following scenarios.
 1. For Hadoop clusters which don't run HBase.
 2. For fallback from HBase when HBase cluster is temporary unavailable. 
 Quoting ATS design document of YARN-2928:
 {quote}
 In the case the HBase
 storage is not available, the plugin should buffer the writes temporarily 
 (e.g. HDFS), and flush
 them once the storage comes back online. Reading and writing to hdfs as the 
 the backup storage
 could potentially use the HDFS writer plugin unless the complexity of 
 generalizing the HDFS
 writer plugin for this purpose exceeds the benefits of reusing it here.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3800) Simplify inmemory state for ReservationAllocation

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597189#comment-14597189
 ] 

Hadoop QA commented on YARN-3800:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |  10m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 28s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  0s | The applied patch generated  1 
new checkstyle issues (total was 54, now 49). |
| {color:green}+1{color} | whitespace |   0m  5s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 58s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  45m 33s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 11s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
 |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart 
|
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741211/YARN-3800.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 99271b7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8321/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8321/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8321/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8321/console |


This message was automatically generated.

 Simplify inmemory state for ReservationAllocation
 -

 Key: YARN-3800
 URL: https://issues.apache.org/jira/browse/YARN-3800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3800.001.patch, YARN-3800.002.patch, 
 YARN-3800.002.patch, YARN-3800.003.patch


 Instead of storing the ReservationRequest we store the Resource for 
 allocations, as thats the only thing we need. Ultimately we convert 
 everything to resources anyway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597187#comment-14597187
 ] 

Hadoop QA commented on YARN-3838:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 52s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 34s | The applied patch generated  1 
new checkstyle issues (total was 39, now 40). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m  2s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| | |  66m 50s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740917/0001-YARN-3838.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 99271b7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8322/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8322/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8322/console |


This message was automatically generated.

 Rest API failing when ip configured in RM address in secure https mode
 --

 Key: YARN-3838
 URL: https://issues.apache.org/jira/browse/YARN-3838
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 
 0001-YARN-3838.patch, 0002-YARN-3810.patch


 Steps to reproduce
 ===
 1.Configure hadoop.http.authentication.kerberos.principal as below
 {code:xml}
   property
 namehadoop.http.authentication.kerberos.principal/name
 valueHTTP/_h...@hadoop.com/value
   /property
 {code}
 2. In RM web address also configure IP 
 3. Startup RM 
 Call Rest API for RM  {{ curl -i -k  --insecure --negotiate -u : https IP 
 /ws/v1/cluster/info}}
 *Actual*
 Rest API  failing
 {code}
 2015-06-16 19:03:49,845 DEBUG 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter: 
 Authentication exception: GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos credentails)
 org.apache.hadoop.security.authentication.client.AuthenticationException: 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos credentails)
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399)
   at 
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519)
   at 
 org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
 {code}



--
This message was sent by 

[jira] [Commented] (YARN-3705) forcemanual transitionToStandby in RM-HA automatic-failover mode should change elector state

2015-06-22 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597176#comment-14597176
 ] 

Masatake Iwasaki commented on YARN-3705:


bq. ResourceManager#handleTransitionToStandBy is expected to be used only when 
automatic failover enabled.

This was not true. It checks not {{isAutomaticFailoverEnabled}} but 
{{isHAEnabled}}. {{ResourceManager#handleTransitionToStandBy}} is no-op if 
{{RMContext#isHAEnabled}} is false.

{code}
  public void handleTransitionToStandBy() {
if (rmContext.isHAEnabled()) {
  try {
// Transition to standby and reinit active services
LOG.info(Transitioning RM to Standby mode);
transitionToStandby(true);
adminService.resetLeaderElection();
return;
  } catch (Exception e) {
LOG.fatal(Failed to transition RM to Standby mode.);
ExitUtil.terminate(1, e);
  }
}
  }
{code}

It seems strange that doing nothing in transitionToStandby if {{isHAEnable}} is 
false affects tests for HA...


 forcemanual transitionToStandby in RM-HA automatic-failover mode should 
 change elector state
 

 Key: YARN-3705
 URL: https://issues.apache.org/jira/browse/YARN-3705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: YARN-3705.001.patch, YARN-3705.002.patch, 
 YARN-3705.003.patch


 Executing {{rmadmin -transitionToStandby --forcemanual}} in 
 automatic-failover.enabled mode makes ResouceManager standby while keeping 
 the state of ActiveStandbyElector. It should make elector to quit and rejoin 
 in order to enable other candidates to promote, otherwise forcemanual 
 transition should not be allowed in automatic-failover mode in order to avoid 
 confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3841) [Storage abstraction] Create HDFS backing storage implementation for ATS writes

2015-06-22 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3841:
-
Description: 
HDFS backing storage is useful for following scenarios.
1. For Hadoop clusters which don't run HBase.
2. For fallback from HBase when HBase cluster is temporary unavailable. Quoting 
ATS design document of YARN-2928:
{quote}
In the case the HBase
storage is not available, the plugin should buffer the writes temporarily (e.g. 
HDFS), and flush
them once the storage comes back online. Reading and writing to hdfs as the the 
backup storage
could potentially use the HDFS writer plugin unless the complexity of 
generalizing the HDFS
writer plugin for this purpose exceeds the benefits of reusing it here.
{quote}


  was:
HDFS backing storage is useful for following scenarios.
1. For Hadoop clusters which don't run HBase.
2. For fallback from HBase when HBase cluster is temporary unavailable. 
{quote}
In the case the HBase
storage is not available, the plugin should buffer the writes temporarily (e.g. 
HDFS), and flush
them once the storage comes back online. Reading and writing to hdfs as the the 
backup storage
could potentially use the HDFS writer plugin unless the complexity of 
generalizing the HDFS
writer plugin for this purpose exceeds the benefits of reusing it here.
{quote}



 [Storage abstraction] Create HDFS backing storage implementation for ATS 
 writes
 ---

 Key: YARN-3841
 URL: https://issues.apache.org/jira/browse/YARN-3841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa

 HDFS backing storage is useful for following scenarios.
 1. For Hadoop clusters which don't run HBase.
 2. For fallback from HBase when HBase cluster is temporary unavailable. 
 Quoting ATS design document of YARN-2928:
 {quote}
 In the case the HBase
 storage is not available, the plugin should buffer the writes temporarily 
 (e.g. HDFS), and flush
 them once the storage comes back online. Reading and writing to hdfs as the 
 the backup storage
 could potentially use the HDFS writer plugin unless the complexity of 
 generalizing the HDFS
 writer plugin for this purpose exceeds the benefits of reusing it here.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3841) [Storage abstraction] Create HDFS backing storage implementation for ATS writes

2015-06-22 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created YARN-3841:


 Summary: [Storage abstraction] Create HDFS backing storage 
implementation for ATS writes
 Key: YARN-3841
 URL: https://issues.apache.org/jira/browse/YARN-3841
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa


HDFS backing storage is useful for following scenarios.
1. For Hadoop clusters which don't run HBase.
2. For fallback from HBase when HBase cluster is temporary unavailable. 
{quote}
In the case the HBase
storage is not available, the plugin should buffer the writes temporarily (e.g. 
HDFS), and flush
them once the storage comes back online. Reading and writing to hdfs as the the 
backup storage
could potentially use the HDFS writer plugin unless the complexity of 
generalizing the HDFS
writer plugin for this purpose exceeds the benefits of reusing it here.
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui bug on main view after application number 9999

2015-06-22 Thread LINTE (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595896#comment-14595896
 ] 

LINTE commented on YARN-3840:
-

Hi,

There is no Java stackstrace for this bug.
I think that the property yarn.resourcemanager.max-completed-applications is in 
cause (default value is 1), but it doesn't work properly.

Maybe yarn.resourcemanager.max-completed-applications is only effective on 
ResourceManager GUI.

Regards,

 Resource Manager web ui bug on main view after application number 
 --

 Key: YARN-3840
 URL: https://issues.apache.org/jira/browse/YARN-3840
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Centos 6.6
 Java 1.7
Reporter: LINTE

 On the WEBUI, the global main view page : 
 http://resourcemanager:8088/cluster/apps doesn't display applications over 
 .
 With command line it works (# yarn application -list).
 Regards,
 Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595971#comment-14595971
 ] 

Hudson commented on YARN-3834:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2164/])
YARN-3834. Scrub debug logging of tokens during resource localization. 
Contributed by Chris Nauroth (xgong: rev 
6c7a9d502a633b5aca75c9798f19ce4a5729014e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Scrub debug logging of tokens during resource localization.
 ---

 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.8.0

 Attachments: YARN-3834.001.patch


 During resource localization, the NodeManager logs tokens at debug level to 
 aid troubleshooting.  This includes the full token representation.  Best 
 practice is to avoid logging anything secret, even at debug level.  We can 
 improve on this by changing the logging to use a scrubbed representation of 
 the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595984#comment-14595984
 ] 

Hudson commented on YARN-3834:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #225 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/225/])
YARN-3834. Scrub debug logging of tokens during resource localization. 
Contributed by Chris Nauroth (xgong: rev 
6c7a9d502a633b5aca75c9798f19ce4a5729014e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 Scrub debug logging of tokens during resource localization.
 ---

 Key: YARN-3834
 URL: https://issues.apache.org/jira/browse/YARN-3834
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.8.0

 Attachments: YARN-3834.001.patch


 During resource localization, the NodeManager logs tokens at debug level to 
 aid troubleshooting.  This includes the full token representation.  Best 
 practice is to avoid logging anything secret, even at debug level.  We can 
 improve on this by changing the logging to use a scrubbed representation of 
 the token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)