[jira] [Assigned] (YARN-3235) Support uniformed scheduler configuration in FairScheduler

2015-02-20 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3235:
---

Assignee: Naganarasimha G R

 Support uniformed scheduler configuration in FairScheduler
 --

 Key: YARN-3235
 URL: https://issues.apache.org/jira/browse/YARN-3235
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Naganarasimha G R





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3235) Support uniformed scheduler configuration in FairScheduler

2015-02-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328829#comment-14328829
 ] 

Naganarasimha G R commented on YARN-3235:
-

Hi [~wangda]  [~kasha]
I would like to work on this, issue, hence assigning this issue my to my name, 
if you guys have any other plan or some body else already working on this 
please inform...

 Support uniformed scheduler configuration in FairScheduler
 --

 Key: YARN-3235
 URL: https://issues.apache.org/jira/browse/YARN-3235
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Naganarasimha G R





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328848#comment-14328848
 ] 

Hudson commented on YARN-933:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #844 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/844/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 

[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328827#comment-14328827
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/110/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328826#comment-14328826
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/110/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328849#comment-14328849
 ] 

Hudson commented on YARN-3076:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #844 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/844/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-02-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328853#comment-14328853
 ] 

Sunil G commented on YARN-3226:
---

Yes. I also feel we could remove some dead tabs, and then add the new tab 
called state.
Filtering mechanism can be added in a new tab which can just show 
decommissioned nodes. I will check on this line and work on same line. Thank 
you.

 UI changes for decommissioning node
 ---

 Key: YARN-3226
 URL: https://issues.apache.org/jira/browse/YARN-3226
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Sunil G

 Some initial thought is:
 decommissioning nodes should still show up in the active nodes list since 
 they are still running containers. 
 A separate decommissioning tab to filter for those nodes would be nice, 
 although I suppose users can also just use the jquery table to sort/search for
 nodes in that state from the active nodes list if it's too crowded to add yet 
 another node
 state tab (or maybe get rid of some effectively dead tabs like the reboot 
 state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-02-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328681#comment-14328681
 ] 

Varun Saxena commented on YARN-3047:


Thanks [~sjlee0] for the review. I guess you are ok with other responses to the 
initial review.
[~zjshen], can you have a look as well once you are free ? I will upload a new 
patch based on comments from both of you.

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3047.001.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3195) Add -help to yarn logs and nodes CLI command

2015-02-20 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3195:

Environment: (was: SUSE Linux SP3)
 Issue Type: Improvement  (was: Bug)

 Add -help to yarn logs and nodes CLI command
 

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.6.0
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3195) Add -help to yarn logs and nodes CLI command

2015-02-20 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3195:

Summary: Add -help to yarn logs and nodes CLI command  (was: [YARN]Missing 
uniformity  In Yarn Queue CLI command)

 Add -help to yarn logs and nodes CLI command
 

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command

2015-02-20 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328669#comment-14328669
 ] 

Devaraj K commented on YARN-3195:
-

Thanks [~jagadesh.kiran] for your work, overall patch is ok except the below 
things to take care.

1. These tests are failing due to the patch changes, can you have a look into 
these?

{code:xml}
org.apache.hadoop.yarn.client.cli.TestLogsCLI
org.apache.hadoop.yarn.client.cli.TestYarnCLI
{code}


2. We need to return the success exit code (i.e. 0) for help case since the 
help command execution becomes success.

{code:xml}
+if (args.length  1 || args[0].equals(-help)) {
   printHelpMessage(printOpts);
   return -1;
 }
{code}

BTW, can you also take care of formatting the newly added code and avoiding the 
new lines addition when you create a patch.


 [YARN]Missing uniformity  In Yarn Queue CLI command
 ---

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328675#comment-14328675
 ] 

Varun Saxena commented on YARN-3197:


[~devaraj.k]
Agree that making only log associated with Null container completion as DEBUG 
doesnt seem right

So what do you think ? Should I print Unknown container or Non-alive 
container ?

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328691#comment-14328691
 ] 

Devaraj K commented on YARN-3197:
-

I would be ok for anyone, better you can add both of them like below.

Unknown or non-alive container  + containerId +  completed with 


 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-20 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328862#comment-14328862
 ] 

Tsuyoshi OZAWA commented on YARN-2820:
--

[~zxu] Thank you for updating a patch. 

1. Should we create *WithRetries methods for 
deleteFile/renameFile/createFile/getFileStatus too? Note that we should update 
replaceFile to use renameFileWithRetires instead of calling 
fs.rename(srcPath, dstPath) directly:
{code}
  protected void replaceFile(Path srcPath, Path dstPath) throws Exception {
if (fs.exists(dstPath)) {
  deleteFile(dstPath);
} else {
  LOG.info(File doesn't exist. Skip deleting the file  + dstPath);
}
fs.rename(srcPath, dstPath);
  }
{code}

2. Should we create existsWithRetries and use it instead of fs.exists()?

2. Please move *WithRetries methods below the following comment:
{code}

  // FileSystem related code

{code}

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at 

[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328948#comment-14328948
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #2042 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2042/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
 

[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328949#comment-14328949
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2042 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2042/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328955#comment-14328955
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #101 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/101/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328954#comment-14328954
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #101 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/101/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329001#comment-14329001
 ] 

Jason Lowe commented on YARN-3131:
--

bq. I do not think that continuously polling until RUNNING is a good idea. The 
most common case on a busy cluster is that an app can be submitted at time X 
but not start running until a long time later.

The patch does not cause the client to poll until the job is RUNNING.  It polls 
until the job has progressed past the SUBMITTED state.  The SUBMITTED state is 
a brief transient state before the ACCEPTED state.  So the client will wait 
approximately as long as it does today, and it fixes that flaky submit unit 
test in Tez.  It will not block until the AM is actually running.

bq. As I mentioned earlier, I still believe that doing some basic checks 
in-line in ClientRMService itself and throwing an exception back straight away 
is probably a better idea than polling for any RUNNING/FAILED state. 

I agree that a blocking method is much easier on the client, but I don't think 
this is an easy change to make in the short term.  Again I think it requires a 
major change to the RPC layer and the RM to support server-side asynchronous 
call handling, otherwise we have to throw an army of threads at the client 
service to avoid blocking other clients and that has scaling issues.  We could 
probably add an API to the scheduler to do an in-line sanity check on the 
requested queue (which is a backwards-incompatible change for schedulers not in 
the Hadoop repo).  However there are many other things that could go wrong 
during submission that take a long time to perform, such as saving the 
application state and renewing delegation tokens.  I'm not sure it's a win if 
we check for one thing in-line that could go wrong but still have to poll for 
all the other things that could go wrong.  In the end, Tez and other YARN 
clients need to know if the app was accepted or not.  The queue being wrong is 
just one of the ways the submit could fail.

Continuing to poll in the SUBMITTED state also meshes with the thoughts on the 
SUBMITTED state being something the client probably shouldn't see anyway.  See 
the discussion about NEW_SAVING and SUBMITTED in YARN-3230.

Thanks, Chang, for updating the patch.  Please investigate the unit test 
failure, as it looks like it could be related.  My only nit on the patch is it 
would be a bit clearer and more efficient if we used EnumSet constants to 
capture the set of states we're waiting the app to leave and the set of states 
that are failed-to-submit states.

I suppose another way to solve this problem is to take the approach discussed 
in YARN-3230 and have the RM not expose the NEW_SAVING and SUBMITTED states to 
the client -- they would just see NEW.  We'd have to leave the states in the 
enumeration for backwards compatibility, but we'd stop exposing them in app 
reports.  Any thoughts on that [~zjshen] or [~jianhe]?

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329021#comment-14329021
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #111 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/111/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* hadoop-yarn-project/CHANGES.txt


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error if it is unable to create LogWriter

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329505#comment-14329505
 ] 

Jason Lowe commented on YARN-3237:
--

I would just add it to this patch since it's a similar and very closely related 
fix.  The JIRA title can be changed to something like AppLogAggregatorImpl 
fails to log error cause.

 AppLogAggregatorImpl fails to log error if it is unable to create LogWriter
 ---

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329544#comment-14329544
 ] 

Hadoop QA commented on YARN-3237:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699927/YARN-3237.patch
  against trunk revision ce5bf92.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6685//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6685//console

This message is automatically generated.

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329643#comment-14329643
 ] 

Xuan Gong commented on YARN-3237:
-

Committed into trunk/branch-2. Thanks, Rushabh

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.7.0

 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329737#comment-14329737
 ] 

Sangjin Lee commented on YARN-3039:
---

Thanks [~djp] for the doc!

Some high level comments:
- I'm also thinking that option 2 might be more feasible, mostly from the 
standpoint of limiting the risk. Having said that, I haven't followed YARN-913 
closely enough to see how close it is...
- The service discovery needs to work across all these different modes: NM aux 
service, standalone per-node daemon, and standalone per-app daemon. That needs 
to be one of the primary considerations in this.
- The failure scenarios need more details in their own right; for this JIRA, I 
think it is sufficient to see how it may impact the service discovery and 
design just enough.

{quote}
We need a per­application logical aggregator for ATS which provides aggregator 
service in
form of REST API to: RM, AM and NMs,
{quote}
The RM will likely not use the service discovery. For example, for RM to write 
the app started event, the timeline aggregator may not even be initialized yet.

{quote}
However, AM container could be reschedule to other
node for some reason (container failure, etc.), so we cannot guarantee the two 
are
always together.
{quote}
If the AM fails and starts in another node, the existing per-app aggregator 
should be shut down, and started on the new node. In fact, in the aux service 
setup, that comes most naturally. So I think we should try to keep that as much 
as possible.

{quote}
Failure Cases: 3. Aggregator failed (only):
{quote}
We're talking about the aggregator failing as a standalone daemon, correct?



 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-20 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-3239:
-

 Summary: WebAppProxy does not support a final tracking url which 
has query fragments and params 
 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Examples of failures:

Expected: 
{{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
Actual: {{http://uihost:8080}}

Tried with a minor change to remove the #. Saw a different issue:

Expected: 
{{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}

yarn application -status appId returns the expected value correctly. However, 
invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-20 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329741#comment-14329741
 ] 

Hitesh Shah commented on YARN-3239:
---

[~jlowe] [~jeagles] Have you come across any cases such as this? 

 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329744#comment-14329744
 ] 

Sangjin Lee commented on YARN-3166:
---

{quote}
1. In TimelineClient, we keep the existing methods that operate on the old data 
model, but mark them deprecated individually.

2. In TimelineClient, we create the new methods that operate on the new data 
model.
{quote}

Just to note the obvious, that would not work if it is a public interface that 
other code implements. If it is internal, yes, we could evolve the interface 
that way, but it is a public interface, that is not an option... So for 
TimelineClient specifically, I think it is true. But as a general rule, I think 
there could be issues...

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329646#comment-14329646
 ] 

Rushabh S Shah commented on YARN-3237:
--

Thanks Xuan for committing.

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.7.0

 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-20 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329708#comment-14329708
 ] 

Robert Kanter commented on YARN-2423:
-

[~vinodkv], as [~zjshen] pointed out earlier, these Java APIs are very tied to 
the REST APIs.  So, if there ends up being a compatibility problem with the 
Java API, I'd imagine the REST API would have the same problem.  And given that 
the new ATS will still take some time, it would be very useful to make a Java 
API available in the meantime, even if we have to eventually deprecate it.  
Even though we're making a new ATS, many users are still using the older one.  
As [~vanzin] pointed out, this JIRA is a blocker for SPARK-1537; it would make 
them better able to use the current ATS.  

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3238:
-
Attachment: YARN-3238.001.patch

Since the IPC layer is already retrying it doesn't make sense to also retry at 
the YARN layer.  Attaching a patch that removes socket connection timeouts from 
the list of errors we retry at the YARN layer.  An alternate approach would be 
to retry at the YARN layer but explicitly tell the IPC layer to _not_ retry 
socket timeouts when creating the proxy.  This change seemed simpler and is 
what we've been doing all along before YARN-2613.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated YARN-3237:
-
Attachment: YARN-3237-v2.patch

Attaching a new patch to add error cause to doContainerLogAggregation method.

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329655#comment-14329655
 ] 

Hudson commented on YARN-3237:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7169 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7169/])
YARN-3237. AppLogAggregatorImpl fails to log error cause. Contributed by 
(xgong: rev f56c65bb3eb9436b67de2df63098e26589e70e56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* hadoop-yarn-project/CHANGES.txt


 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Fix For: 2.7.0

 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329671#comment-14329671
 ] 

Li Lu commented on YARN-3033:
-

Hi [~sjlee0], thanks for the comments! I thing I used the wrong name 
Application Level Aggregator inside RM here. I think a more appropriate name 
would be aggregator inside RM for app data. I agree that we should strictly 
limit the total number of this type of aggregators (one per RM seems to be 
reasonable for now). We may want to reuse the implementation for web 
server/data storage layer in aggregator collection for this aggregator as well, 
by simply wrap it to an aggregator collection? 

For the second point, yes, we can (and should always) reuse the same hbase 
client for all app level aggregators on the same node. 

 [Aggregator wireup] Implement NM starting the ATS writer companion
 --

 Key: YARN-3033
 URL: https://issues.apache.org/jira/browse/YARN-3033
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf


 Per design in YARN-2928, implement node managers starting the ATS writer 
 companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated YARN-3237:
-
Summary: AppLogAggregatorImpl fails to log error cause  (was: 
AppLogAggregatorImpl fails to log error if it is unable to create LogWriter)

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329636#comment-14329636
 ] 

Xuan Gong commented on YARN-3237:
-

+1 LGTM. Will commit

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error if it is unable to create LogWriter

2015-02-20 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329492#comment-14329492
 ] 

Rushabh S Shah commented on YARN-3237:
--

There is 1 more method in AppLogAggregatorImpl#doContainerLogAggregation which 
doesn't log the exception too.
Do I need to create another jira or should I add in this patch ?

 AppLogAggregatorImpl fails to log error if it is unable to create LogWriter
 ---

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3237) AppLogAggregatorImpl fails to log error cause

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329570#comment-14329570
 ] 

Hadoop QA commented on YARN-3237:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699931/YARN-3237-v2.patch
  against trunk revision 8c6ae0d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6686//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6686//console

This message is automatically generated.

 AppLogAggregatorImpl fails to log error cause
 -

 Key: YARN-3237
 URL: https://issues.apache.org/jira/browse/YARN-3237
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: YARN-3237-v2.patch, YARN-3237.patch


 AppLogAggregatorImpl fails to log the error if it is unable to create 
 LogWriter.
 Below is the log output:
 [LogAggregationService #24011] ERROR logaggregation.AppLogAggregatorImpl: 
 Cannot create writer for app app_id. Disabling log-aggregation for this app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3238:


 Summary: Connection timeouts to nodemanagers are retried at 
multiple levels
 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker


The IPC layer will retry connection timeouts automatically (see Client.java), 
but we are also retrying them with YARN's RetryPolicy put in place when the NM 
proxy is created.  This causes a two-level retry mechanism where the IPC layer 
has already retried quite a few times (45 by default) for each YARN RetryPolicy 
error that is retried.  The end result is that NM clients can wait a very, very 
long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329748#comment-14329748
 ] 

Sangjin Lee commented on YARN-3033:
---

Agreed. Thanks for the clarification.

In summary, I just want to make sure that we do not impose an app-level context 
for the RM's aggregator.

 [Aggregator wireup] Implement NM starting the ATS writer companion
 --

 Key: YARN-3033
 URL: https://issues.apache.org/jira/browse/YARN-3033
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf


 Per design in YARN-2928, implement node managers starting the ATS writer 
 companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration

2015-02-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329949#comment-14329949
 ] 

Sunil G commented on YARN-2986:
---

Thank you [~leftnoteasy], this is much awaited ticket :)

I have few inputs on same.

1. 
{noformat}

  policy-properties
resource-calculator
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
/resource-calculator
  /policy-properties
{noformat}

and
{noformat}
  policy-properties
user-limit-factor2/user-limit-factor
{noformat}
This is inside queue.

Do you mean that, non repeating items are kept outside loop of queues and 
changing items are kept inside each queue?
Hoewever if i have only one set userlimit, node labels etc, and if i keep all 
of those policy-properties outside *queue* section, then will it be applicable 
for all queues?

If not, I suggest we can have policy-property name concept. 
{noformat}
  queue name=default
  stateRUNNING/state
  acl_submit_applications*/acl_submit_applications
  acl_administer_queue*/acl_administer_queue
  accessible-node-labelsx/accessible-node-labels
  policy-propertiesgpu/policy-properties
  /queue
  
  queue name=queueA
  stateRUNNING/state
  acl_submit_applications*/acl_submit_applications
  acl_administer_queue*/acl_administer_queue
  accessible-node-labelsx/accessible-node-labels
  policy-propertiesgpu/policy-properties
  /queue

  policy-properties name=gpu
user-limit-factor2/user-limit-factor
node-labels
node-label name=x
capacity20/capacity
maximum-capacity50/maximum-capacity
/node-label
/node-labels
{noformat}

It cab be shared as needed across queues. And will make the queue part more 
readable.


 (Umbrella) Support hierarchical and unified scheduler configuration
 ---

 Key: YARN-2986
 URL: https://issues.apache.org/jira/browse/YARN-2986
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
 Attachments: YARN-2986.1.patch


 Today's scheduler configuration is fragmented and non-intuitive, and needs to 
 be improved. Details in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-02-20 Thread zhihai xu (JIRA)
zhihai xu created YARN-3241:
---

 Summary: Leading space, trailing space and empty sub queue name 
may cause MetricsException for fair scheduler
 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu


Leading space, trailing space and empty sub queue name may cause 
MetricsException(Metrics source XXX already exists! ) when add application to 
FairScheduler.
The reason is because QueueMetrics parse the queue name different from the 
QueueManager.
QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
and trailing space in the sub queue name, It will also remove empty sub queue 
name.
{code}
  static final Splitter Q_SPLITTER =
  Splitter.on('.').omitEmptyStrings().trimResults(); 
{code}
But QueueManager won't remove Leading space, trailing space and empty sub queue 
name.
This will cause out of sync between FSQueue and FSQueueMetrics.
QueueManager will think two queue names are different so it will try to create 
a new queue.
But FSQueueMetrics will think these two queue names as same which will create 
Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-20 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3240:
-

 Summary: [Data Mode] Implement client API to put generic entities
 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3233) Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration.

2015-02-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3233:
-
Attachment: YARN-3233.1.patch

Attached ver.1 patch, please share your thoughts, example please refer to 
{{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/test-capacity-scheduler-hierarchy.xml}}
 in the patch.

 Implement scheduler common configuration parser and create an abstraction 
 CapacityScheduler configuration layer to support plain/hierarchy 
 configuration.
 -

 Key: YARN-3233
 URL: https://issues.apache.org/jira/browse/YARN-3233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3233.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3233) Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration.

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329967#comment-14329967
 ] 

Hadoop QA commented on YARN-3233:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699989/YARN-3233.1.patch
  against trunk revision 0d6af57.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6688//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6688//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6688//console

This message is automatically generated.

 Implement scheduler common configuration parser and create an abstraction 
 CapacityScheduler configuration layer to support plain/hierarchy 
 configuration.
 -

 Key: YARN-3233
 URL: https://issues.apache.org/jira/browse/YARN-3233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3233.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329848#comment-14329848
 ] 

Zhijie Shen commented on YARN-3166:
---

bq.  But as a general rule, I think there could be issues...

Yeah, could be, but hopefully it won't be significant. in YARN-3240, I've made 
patch to add new client APIs into TimelineClient. Taking a close look at 
TimelineClientImpl, the code of operating on old data model is relative a small 
piece, while will need to carry over most skeleton code as well as part of the 
APIs, i.e. DT operations.

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329998#comment-14329998
 ] 

Mit Desai commented on YARN-3238:
-

+1 (non binding)
Looks good to me

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329818#comment-14329818
 ] 

Hadoop QA commented on YARN-3238:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699964/YARN-3238.001.patch
  against trunk revision f56c65b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6687//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6687//console

This message is automatically generated.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-20 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3240:
--
Attachment: YARN-3240.1.patch

In this Jira, I aim to make the basic client write APIs ready. Later on, we can 
add more advanced APIs to simply putting predefined data, such as flow. Bellow 
is the summary of this patch:

1. Added the two Java methods of TimelineClient to operate the new generic 
timeline entity data object. One is for blocking call, while the other is for 
the async call. The two methods wrap over the same REST HTTP API, but giving 
different async param. At the server side, the aggregator can consume this 
param to determine whether to use sync or asycn call to persist the timeline 
data.

It needs the resource URI and the context appID to know where the request 
should be sent to and which conceptual per-app aggregator should be routed to. 
This is blocked by YARN-3039. For now, I just leave them unset in the code.

2. Change the endpoint at per-node web service accordingly to make the client 
and the server be paired.

3. One more data object, TimelineEntities, which is the collection of 
TimelineEntity, is added to host multiple entities for one request to the 
aggregator.

The rationale behind putting the new client API inside the existing timeline 
client is:

1. Most of TimelineClientImpl code could be reused, including making HTTP 
calls, security, retry and so on.

2. Not all the client APIs will be deprecated, but just those that operate on 
the old data model. For example, delegation token related APIs may still stay.

 [Data Mode] Implement client API to put generic entities
 

 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3240.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90

2015-02-20 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329958#comment-14329958
 ] 

zhihai xu commented on YARN-2799:
-

Thanks [~djp] for valuable feedback and committing the patch! 
Greatly appreciated

 cleanup TestLogAggregationService based on the change in YARN-90
 

 Key: YARN-2799
 URL: https://issues.apache.org/jira/browse/YARN-2799
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2799.000.patch, YARN-2799.001.patch, 
 YARN-2799.002.patch


 cleanup TestLogAggregationService based on the change in YARN-90.
 The following code is added to setup in YARN-90, 
 {code}
 dispatcher = createDispatcher();
 appEventHandler = mock(EventHandler.class);
 dispatcher.register(ApplicationEventType.class, appEventHandler);
 {code}
 In this case, we should remove all these code from each test function to 
 avoid duplicate code.
 Same for dispatcher.stop() which is in tearDown,
 we can remove dispatcher.stop() from from each test function also because it 
 will always be called from tearDown for each test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-3239:
-

Assignee: Jian He

 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He

 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329917#comment-14329917
 ] 

Wangda Tan commented on YARN-2986:
--

An update:

After an offline discussion with [~vinodkv] and [~jianhe], now proposed 
configuration file looks like:
{code}
scheduler
  typecapacity/type
  maximum-applications/maximum-applications
  queue-mappings/queue-mappings
  queue-mappings-override-enable/queue-mappings-override-enable
  maximum-am-resource-percent0.3/maximum-am-resource-percent 
  
  policy-properties
resource-calculator
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
/resource-calculator
  /policy-properties
  
  queue name=root
queues
queue name=default
  stateRUNNING/state
  acl_submit_applications*/acl_submit_applications
  acl_administer_queue*/acl_administer_queue
  accessible-node-labelsx/accessible-node-labels
  
  policy-properties
user-limit-factor2/user-limit-factor
capacity50/capacity
maximum-capacity90/maximum-capacity
node-locality-delay30/node-locality-delay
node-labels
node-label name=x
capacity20/capacity
maximum-capacity50/maximum-capacity
/node-label
/node-labels
  /policy-properties
/queue
/queues
  /queue
/scheduler
{code}

One highlight of this proposal and previous proposal is: this contains a 
policy-properties for each configuration node, which means a 
scheduler-specific configurations, like capacity in CapacityScheduler and 
minShare in FairScheduler, etc. (policy here means different kinds of 
scheduling method).

For other common options (not belongs to a specific scheduler implementation), 
should be placed outside of policy-properties.

*Please feel free to share your thoughts about this proposal :).*

To move this forward, I filed several sub ticket, YARN-3233 is targeted to 
solve the configuration file (for common scheduler and capacity scheduler) 
definition and parsing, I will upload a patch right now. YARN-3234 is to solve 
Capacity Scheduler integration with the new config file.

 (Umbrella) Support hierarchical and unified scheduler configuration
 ---

 Key: YARN-2986
 URL: https://issues.apache.org/jira/browse/YARN-2986
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
 Attachments: YARN-2986.1.patch


 Today's scheduler configuration is fragmented and non-intuitive, and needs to 
 be improved. Details in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node

2015-02-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3194:
-
Summary: RM should handle NMContainerStatuses sent by NM while registering 
if NM is Reconnected node  (was: After NM restart, RM should handle 
NMCotainerStatuses sent by NM while registering if NM is Reconnected node)

 RM should handle NMContainerStatuses sent by NM while registering if NM is 
 Reconnected node
 ---

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
 registration. The registration can be treated by RM as New node or 
 Reconnecting node. RM triggers corresponding event on the basis of node added 
 or node reconnected state. 
 # Node added event : Again here 2 scenario's can occur 
 ## New node is registering with different ip:port – NOT A PROBLEM
 ## Old node is re-registering because of RESYNC command from RM when RM 
 restart – NOT A PROBLEM
 # Node reconnected event : 
 ## Existing node is re-registering i.e RM treat it as reconnecting node when 
 RM is not restarted 
 ### NM RESTART NOT Enabled – NOT A PROBLEM
 ### NM RESTART is Enabled 
  Some applications are running on this node – *Problem is here*
  Zero applications are running on this node – NOT A PROBLEM
 Since NMContainerStatus are not handled, RM never get to know about 
 completedContainer and never release resource held be containers. RM will not 
 allocate new containers for pending resource request as long as the 
 completedContainer event is triggered. This results in applications to wait 
 indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329020#comment-14329020
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #111 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/111/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 

[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329012#comment-14329012
 ] 

Jason Lowe commented on YARN-3131:
--

Actually just saw YARN-3232 was filed which proposes to stop exposing the 
NEW_SAVING and SUBMITTED states to clients.  If we do that then all we need to 
do in YarnClientImpl is have it throw when the non-NEW state is FAILED or 
KILLED to indicate the submit failed.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329084#comment-14329084
 ] 

Junping Du commented on YARN-3039:
--

Hi [~rkanter], thanks for sharing your thoughts here. 
I think as a generic, external service for YARN, YARN-913 may not meet our 
particular requirements here, like: 
- timeline service will serve as build-in service, not necessary for 
application to register service explicitly
- NM also need this aggregators info to aggregate info related to containers 
running on top of it.
- We have preference to bind service to local node of AM container
- Now, the launching of NM aggregators is not in way of YARN service container 
(see YARN-3033)
Also, I think we may not want this built-in service (as a standalone feature) 
to depends on another big feature in progress when unnecessary. Thoughts?

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329050#comment-14329050
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2061 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2061/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
 FINAL_SAVING
 -

 Key: YARN-933
 URL: https://issues.apache.org/jira/browse/YARN-933
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch


 am max retries configured as 3 at client and RM side.
 Step 1: Install cluster with NM on 2 Machines 
 Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
 using Hostname should fail
 Step 3: Execute a job
 Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
 connection loss happened.
 Observation :
 ==
 After AppAttempt_1 has moved to failed state ,release of container for 
 AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
 sponed.
 1. Then again retry for AppAttempt_1 happens.
 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
 InvalidStateTransitonException
 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
 still running ], while the appattempts configured is 3 and rest appattempts 
 are all sponed and running.
 RMLogs:
 ==
 2013-07-17 16:22:51,013 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
 maxRetries=45
 2013-07-17 16:36:07,091 INFO 
 org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
 Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
 to EXPIRED
 2013-07-17 16:36:07,093 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Registering appattempt_1373952096466_0056_02
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
 2013-07-17 16:36:07,131 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Application removed - appId: application_1373952096466_0056 user: Rex 
 leaf-queue of parent: root #applications: 35
 2013-07-17 16:36:07,132 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application Submission: appattempt_1373952096466_0056_02, 
 2013-07-17 16:36:07,138 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
 maxRetries=45
 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
 maxRetries=45
 2013-07-17 16:38:56,207 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
 launching appattempt_1373952096466_0056_01. Got exception: 
 java.lang.reflect.UndeclaredThrowableException
 2013-07-17 16:38:56,207 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 LAUNCH_FAILED at FAILED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-20 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3039:
-
Attachment: Service Binding for applicationaggregator of ATS (draft).pdf

I put some thoughts into a draft proposal here. Welcome everyone for comments!
 [~rkanter], do you start the work on this JIRA? If not, do you mind I take 
this JIRA over? Thanks!

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329051#comment-14329051
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2061 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2061/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java


 Add API/Implementation to YarnClient to retrieve label-to-node mapping
 --

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3194) RM should handle NMContainerStatuses sent by NM while registering if NM is Reconnected node

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329048#comment-14329048
 ] 

Hudson commented on YARN-3194:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7162 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7162/])
YARN-3194. RM should handle NMContainerStatuses sent by NM while registering if 
NM is Reconnected node. Contributed by Rohith (jlowe: rev 
a64dd3d24bfcb9af21eb63869924f6482b147fd3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeReconnectEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java


 RM should handle NMContainerStatuses sent by NM while registering if NM is 
 Reconnected node
 ---

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Fix For: 2.7.0

 Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
 registration. The registration can be treated by RM as New node or 
 Reconnecting node. RM triggers corresponding event on the basis of node added 
 or node reconnected state. 
 # Node added event : Again here 2 scenario's can occur 
 ## New node is registering with different ip:port – NOT A PROBLEM
 ## Old node is re-registering because of RESYNC command from RM when RM 
 restart – NOT A PROBLEM
 # Node reconnected event : 
 ## Existing node is re-registering i.e RM treat it as reconnecting node when 
 RM is not restarted 
 ### NM RESTART NOT Enabled – NOT A PROBLEM
 ### NM RESTART is Enabled 
  Some applications are running on this node – *Problem is here*
  Zero applications are running on this node – NOT A PROBLEM
 Since NMContainerStatus are not handled, RM never get to know about 
 completedContainer and never release resource held be containers. RM will not 
 allocate new containers for pending resource request as long as the 
 completedContainer event is triggered. This results in applications to wait 
 indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3115) [Aggregator wireup] Work-preserving restarting of per-node aggregator

2015-02-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329103#comment-14329103
 ] 

Junping Du commented on YARN-3115:
--

Hi [~zjshen], do you already start the work on this? If not, I am quite 
familiar with NM work preserving in restart. Can I take this JIRA on? Thanks!

 [Aggregator wireup] Work-preserving restarting of per-node aggregator
 -

 Key: YARN-3115
 URL: https://issues.apache.org/jira/browse/YARN-3115
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-3030 makes the per-node aggregator work as the aux service of a NM. It 
 contains the states of the per-app aggregators corresponding to the running 
 AM containers on this NM. While NM is restarted in work-preserving mode, this 
 information of per-node aggregator needs to be carried on over restarting too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-02-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3241:

Description: 
Leading space, trailing space and empty sub queue name may cause 
MetricsException(Metrics source XXX already exists! ) when add application to 
FairScheduler.
The reason is because QueueMetrics parse the queue name different from the 
QueueManager.
QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
and trailing space in the sub queue name, It will also remove empty sub queue 
name.
{code}
  static final Splitter Q_SPLITTER =
  Splitter.on('.').omitEmptyStrings().trimResults(); 
{code}
But QueueManager won't remove Leading space, trailing space and empty sub queue 
name.
This will cause out of sync between FSQueue and FSQueueMetrics.
QueueManager will think two queue names are different so it will try to create 
a new queue.
But FSQueueMetrics will treat these two queue names as same which will create 
Metrics source XXX already exists! MetricsException.

  was:
Leading space, trailing space and empty sub queue name may cause 
MetricsException(Metrics source XXX already exists! ) when add application to 
FairScheduler.
The reason is because QueueMetrics parse the queue name different from the 
QueueManager.
QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
and trailing space in the sub queue name, It will also remove empty sub queue 
name.
{code}
  static final Splitter Q_SPLITTER =
  Splitter.on('.').omitEmptyStrings().trimResults(); 
{code}
But QueueManager won't remove Leading space, trailing space and empty sub queue 
name.
This will cause out of sync between FSQueue and FSQueueMetrics.
QueueManager will think two queue names are different so it will try to create 
a new queue.
But FSQueueMetrics will think these two queue names as same which will create 
Metrics source XXX already exists! MetricsException.


 Leading space, trailing space and empty sub queue name may cause 
 MetricsException for fair scheduler
 

 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu

 Leading space, trailing space and empty sub queue name may cause 
 MetricsException(Metrics source XXX already exists! ) when add application to 
 FairScheduler.
 The reason is because QueueMetrics parse the queue name different from the 
 QueueManager.
 QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
 and trailing space in the sub queue name, It will also remove empty sub queue 
 name.
 {code}
   static final Splitter Q_SPLITTER =
   Splitter.on('.').omitEmptyStrings().trimResults(); 
 {code}
 But QueueManager won't remove Leading space, trailing space and empty sub 
 queue name.
 This will cause out of sync between FSQueue and FSQueueMetrics.
 QueueManager will think two queue names are different so it will try to 
 create a new queue.
 But FSQueueMetrics will treat these two queue names as same which will create 
 Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3241) Leading space, trailing space and empty sub queue name may cause MetricsException for fair scheduler

2015-02-20 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3241:

Description: 
Leading space, trailing space and empty sub queue name may cause 
MetricsException(Metrics source XXX already exists! ) when add application to 
FairScheduler.
The reason is because QueueMetrics parse the queue name different from the 
QueueManager.
QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
and trailing space in the sub queue name, It will also remove empty sub queue 
name.
{code}
  static final Splitter Q_SPLITTER =
  Splitter.on('.').omitEmptyStrings().trimResults(); 
{code}
But QueueManager won't remove Leading space, trailing space and empty sub queue 
name.
This will cause out of sync between FSQueue and FSQueueMetrics.
QueueManager will think two queue names are different so it will try to create 
a new queue.
But FSQueueMetrics will treat these two queue names as same queue which will 
create Metrics source XXX already exists! MetricsException.

  was:
Leading space, trailing space and empty sub queue name may cause 
MetricsException(Metrics source XXX already exists! ) when add application to 
FairScheduler.
The reason is because QueueMetrics parse the queue name different from the 
QueueManager.
QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
and trailing space in the sub queue name, It will also remove empty sub queue 
name.
{code}
  static final Splitter Q_SPLITTER =
  Splitter.on('.').omitEmptyStrings().trimResults(); 
{code}
But QueueManager won't remove Leading space, trailing space and empty sub queue 
name.
This will cause out of sync between FSQueue and FSQueueMetrics.
QueueManager will think two queue names are different so it will try to create 
a new queue.
But FSQueueMetrics will treat these two queue names as same which will create 
Metrics source XXX already exists! MetricsException.


 Leading space, trailing space and empty sub queue name may cause 
 MetricsException for fair scheduler
 

 Key: YARN-3241
 URL: https://issues.apache.org/jira/browse/YARN-3241
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: zhihai xu
Assignee: zhihai xu

 Leading space, trailing space and empty sub queue name may cause 
 MetricsException(Metrics source XXX already exists! ) when add application to 
 FairScheduler.
 The reason is because QueueMetrics parse the queue name different from the 
 QueueManager.
 QueueMetrics use Q_SPLITTER to parse queue name, it will remove Leading space 
 and trailing space in the sub queue name, It will also remove empty sub queue 
 name.
 {code}
   static final Splitter Q_SPLITTER =
   Splitter.on('.').omitEmptyStrings().trimResults(); 
 {code}
 But QueueManager won't remove Leading space, trailing space and empty sub 
 queue name.
 This will cause out of sync between FSQueue and FSQueueMetrics.
 QueueManager will think two queue names are different so it will try to 
 create a new queue.
 But FSQueueMetrics will treat these two queue names as same queue which will 
 create Metrics source XXX already exists! MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-20 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329136#comment-14329136
 ] 

Rohith commented on YARN-3222:
--

I see there are 2 ways of fixing the issue.
# Always send NODE_RESOURCE_UPDATE event to scheduler via 
RMNodeEventType.RESOURCE_UPDATE of RMnode
# When NODE_ADDED event is sent to scheduler, again sending 
NODE_RESOURCE_UPDATE event to same node ReconnectedNodeTransition is duplicate 
update request because scheduler has already been updated resources with newly 
added node i.e NODE_REMOVED-NODE_ADDED--NODE_RESOURCE_UPDATE--. So if NO 
applications are  running in the node, then it is not required to send 
node_resource_update request.

I would prefer for 2nd option because here one duplicate resource update can be 
optimized. 

 RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
 order
 ---

 Key: YARN-3222
 URL: https://issues.apache.org/jira/browse/YARN-3222
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical

 When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
 scheduler in a events node_added,node_removed or node_resource_update. These 
 events should be notified in an sequential order i.e node_added event and 
 next node_resource_update events.
 But if the node is reconnected with different http port, the oder of 
 scheduler events are node_removed -- node_resource_update -- node_added 
 which causes scheduler does not find the node and throw NPE and RM exit.
 Node_Resource_update event should be always should be triggered via 
 RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90

2015-02-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329209#comment-14329209
 ] 

Junping Du commented on YARN-2799:
--

+1. Committing it now.

 cleanup TestLogAggregationService based on the change in YARN-90
 

 Key: YARN-2799
 URL: https://issues.apache.org/jira/browse/YARN-2799
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2799.000.patch, YARN-2799.001.patch, 
 YARN-2799.002.patch


 cleanup TestLogAggregationService based on the change in YARN-90.
 The following code is added to setup in YARN-90, 
 {code}
 dispatcher = createDispatcher();
 appEventHandler = mock(EventHandler.class);
 dispatcher.register(ApplicationEventType.class, appEventHandler);
 {code}
 In this case, we should remove all these code from each test function to 
 avoid duplicate code.
 Same for dispatcher.stop() which is in tearDown,
 we can remove dispatcher.stop() from from each test function also because it 
 will always be called from tearDown for each test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329194#comment-14329194
 ] 

Hadoop QA commented on YARN-2083:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650950/YARN-2083-3.patch
  against trunk revision a64dd3d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6682//console

This message is automatically generated.

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: fairscheduler
 Attachments: YARN-2083-1.patch, YARN-2083-2.patch, YARN-2083-3.patch, 
 YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-20 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3131:
---
Attachment: yarn_3131_v2.patch

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3235) Support uniformed scheduler configuration in FairScheduler

2015-02-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329179#comment-14329179
 ] 

Karthik Kambatla commented on YARN-3235:


It is all yours, [~Naganarasimha]. Thanks for checking in. 

 Support uniformed scheduler configuration in FairScheduler
 --

 Key: YARN-3235
 URL: https://issues.apache.org/jira/browse/YARN-3235
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Naganarasimha G R





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329187#comment-14329187
 ] 

Hadoop QA commented on YARN-2799:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698939/YARN-2799.002.patch
  against trunk revision a64dd3d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6681//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6681//console

This message is automatically generated.

 cleanup TestLogAggregationService based on the change in YARN-90
 

 Key: YARN-2799
 URL: https://issues.apache.org/jira/browse/YARN-2799
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2799.000.patch, YARN-2799.001.patch, 
 YARN-2799.002.patch


 cleanup TestLogAggregationService based on the change in YARN-90.
 The following code is added to setup in YARN-90, 
 {code}
 dispatcher = createDispatcher();
 appEventHandler = mock(EventHandler.class);
 dispatcher.register(ApplicationEventType.class, appEventHandler);
 {code}
 In this case, we should remove all these code from each test function to 
 avoid duplicate code.
 Same for dispatcher.stop() which is in tearDown,
 we can remove dispatcher.stop() from from each test function also because it 
 will always be called from tearDown for each test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329224#comment-14329224
 ] 

Hudson commented on YARN-2799:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7163 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7163/])
YARN-2799. Cleanup TestLogAggregationService based on the change in YARN-90. 
Contributed by Zhihai Xu (junping_du: rev 
c33ae271c24f0770c9735ccd2086cafda4f4e0b2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


 cleanup TestLogAggregationService based on the change in YARN-90
 

 Key: YARN-2799
 URL: https://issues.apache.org/jira/browse/YARN-2799
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2799.000.patch, YARN-2799.001.patch, 
 YARN-2799.002.patch


 cleanup TestLogAggregationService based on the change in YARN-90.
 The following code is added to setup in YARN-90, 
 {code}
 dispatcher = createDispatcher();
 appEventHandler = mock(EventHandler.class);
 dispatcher.register(ApplicationEventType.class, appEventHandler);
 {code}
 In this case, we should remove all these code from each test function to 
 avoid duplicate code.
 Same for dispatcher.stop() which is in tearDown,
 we can remove dispatcher.stop() from from each test function also because it 
 will always be called from tearDown for each test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2015-02-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2083:
---
Labels: fairscheduler  (was: assignContainer fair scheduler)

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: fairscheduler
 Attachments: YARN-2083-1.patch, YARN-2083-2.patch, YARN-2083-3.patch, 
 YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good again

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329225#comment-14329225
 ] 

Hudson commented on YARN-90:


FAILURE: Integrated in Hadoop-trunk-Commit #7163 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7163/])
YARN-2799. Cleanup TestLogAggregationService based on the change in YARN-90. 
Contributed by Zhihai Xu (junping_du: rev 
c33ae271c24f0770c9735ccd2086cafda4f4e0b2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* hadoop-yarn-project/CHANGES.txt


 NodeManager should identify failed disks becoming good again
 

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.10.patch, apache-yarn-90.2.patch, apache-yarn-90.3.patch, 
 apache-yarn-90.4.patch, apache-yarn-90.5.patch, apache-yarn-90.6.patch, 
 apache-yarn-90.7.patch, apache-yarn-90.8.patch, apache-yarn-90.9.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329598#comment-14329598
 ] 

Zhijie Shen commented on YARN-3166:
---

bq. In such a scenario if we try to Reuse REST then we might end up with lot of 
If else...

I'm not sure if I understand your question correction, but I think we may not 
end up with lot of if/else. The proposal is:

1. In TimelineClient, we keep the existing methods that operate on the old data 
model, but mark them deprecated individually.

2. In TimelineClient, we create the new methods that operate on the new data 
model.

The methods of operating new and old data models are separated.

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-02-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329275#comment-14329275
 ] 

Junping Du commented on YARN-3223:
--

Hi [~varun_saxena], I am glad if you can work on this JIRA. :) My 
recommendation here just you can check some sub-tasks under YARN-291 which is 
related to the work here.

 Resource update during NM graceful decommission
 ---

 Key: YARN-3223
 URL: https://issues.apache.org/jira/browse/YARN-3223
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Junping Du
Assignee: Varun Saxena

 During NM graceful decommission, we should handle resource update properly, 
 include: make RMNode keep track of old resource for possible rollback, keep 
 available resource to 0 and used resource get updated when
 container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329324#comment-14329324
 ] 

Wangda Tan commented on YARN-3230:
--

Committing..

 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
 YARN-3230.3.patch, application page.png


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329334#comment-14329334
 ] 

Hudson commented on YARN-3230:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7164 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7164/])
YARN-3230. Clarify application states on the web UI. (Jian He via wangda) 
(wangda: rev ce5bf927c3d9f212798de1bf8706e5e9def235a1)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java


 Clarify application states on the web UI
 

 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
 YARN-3230.3.patch, application page.png


 Today, application state are simply surfaced as a single word on the web UI. 
 Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. 
 This jira is to clarify the meaning of these states, things like what the 
 application is waiting for at this state. 
 In addition,the difference between application state and FinalStatus are 
 fairly confusing to users, especially when state=FINISHED, but 
 FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3197:
---
Attachment: YARN-3197.002.patch

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3115) [Aggregator wireup] Work-preserving restarting of per-node aggregator

2015-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329299#comment-14329299
 ] 

Zhijie Shen commented on YARN-3115:
---

Please feel free to take it over.

 [Aggregator wireup] Work-preserving restarting of per-node aggregator
 -

 Key: YARN-3115
 URL: https://issues.apache.org/jira/browse/YARN-3115
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-3030 makes the per-node aggregator work as the aux service of a NM. It 
 contains the states of the per-app aggregators corresponding to the running 
 AM containers on this NM. While NM is restarted in work-preserving mode, this 
 information of per-node aggregator needs to be carried on over restarting too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329308#comment-14329308
 ] 

Hadoop QA commented on YARN-3131:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699905/yarn_3131_v2.patch
  against trunk revision c33ae27.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6683//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6683//console

This message is automatically generated.

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329356#comment-14329356
 ] 

Vinod Kumar Vavilapalli commented on YARN-2423:
---

Apologies for coming in late, but should we rewrite this to be inline with 
YARN-2928? Given YARN-2928, I don't see an immediate value in having the patch 
as is (with a new API that nobody uses in the interim) and then rewrite it once 
more for YARN-2928. I know considerable effort has been spent on this, but what 
do you folks think?

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329358#comment-14329358
 ] 

Wangda Tan commented on YARN-1963:
--

That's great! Thanks.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329368#comment-14329368
 ] 

Zhijie Shen commented on YARN-3031:
---

[~vrushalic], YARN-3041 is committed, would you mind updating this patch 
accordingly? And we'd like to review it again. Thanks!

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-20 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329375#comment-14329375
 ] 

Vrushali C commented on YARN-3031:
--

Yes, I will.. 

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2495:
-
Target Version/s: 2.8.0  (was: 2.6.0)

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-20 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329380#comment-14329380
 ] 

Chang Li commented on YARN-3131:


Updated patch, fixed unit test and improved the code style. Thanks both 
[~jlowe] and [~hitesh] for providing valuable opinions. 
[~hitesh]We could open new jira(s) to address the issue of confirming AM gets 
launched. The current fix could be a short term solution because transition 
from submit state to accept state won't take too long as explained by Jason. We 
are not polling for Running/Failed state. It's not optimized but the situations 
of submitting work to a incorrect queue aren't common either.
[~zjshen][~jianhe]Could you please kindly also comment on this problem? Thanks. 

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329393#comment-14329393
 ] 

Naganarasimha G R commented on YARN-3166:
-

Hi [~zjshen]  [~sjlee0]
bq. I'm not aware that RM has something similar to NM aux service to decouple 
RM and aggregator, too.
+1 for this approach, even i was thinking in the same lines and actually i had 
raised jira for AUX support in RM for some other functionality in YARN - 2267. 

bq. The benefit is that we can reuse the whole skeleton, including http rest 
wrapper, security and retry code, but just need to handle some different data 
objects, and direct the request to a different location. Instead of deprecate 
the whole class, we can deprecate the individual methods. Thought?
One problem i can think of,  as per Sangjin's comments still we remove ATSV1 we 
need to support both to work based on configurations enable either of one. In 
such a scenario if we try to Reuse REST then we might end up with lot of If 
else...

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329410#comment-14329410
 ] 

Hadoop QA commented on YARN-3197:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699909/YARN-3197.002.patch
  against trunk revision c33ae27.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6684//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6684//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6684//console

This message is automatically generated.

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329359#comment-14329359
 ] 

Wangda Tan commented on YARN-1963:
--

That's great! Thanks.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329357#comment-14329357
 ] 

Wangda Tan commented on YARN-1963:
--

That's great! Thanks.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-02-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-3003.
--
Resolution: Duplicate

[~varun_saxena],
Thanks for reminding, I just reopen and then resolved as duplicated, since 
patch of this JIRA is divided to other two JIRAs, there's no code actually 
committed for this one.

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena
 Attachments: YARN-3003.001.patch, YARN-3003.002.patch


 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-02-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reopened YARN-3003:
--

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena
 Attachments: YARN-3003.001.patch, YARN-3003.002.patch


 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329376#comment-14329376
 ] 

Naganarasimha G R commented on YARN-2495:
-

Hi [~wangda],
Thanks for discussing and summarizing this, few points to discuss:
# YARN-2923 (configuration based NodeLabelProviderService), what is the scope 
of this jira as its again one more classification under {{distributed}} ? 
whether is it req? 
# Apart from the Vinod's suggestion, i would like to add one more change : In 
some jiras we were discussing to have fail fast approach i.e. in our case if we 
configure script based distributed configurations and the script file doesn't 
exist, rights not sufficient etc.. then NM should fail to start.
# yarn-2980 is mostly just moving NodeHealthScriptRunner to HadoopCommon and 
the code in it differs a lot than what we req, if  you remember earlier i had 
refactored that code and make it reusable for our scenario but you had 
suggested not to touch the existing NodeHealthScriptRunner and  to make 
ScriptBasedNodeLabelsProvider similar NodeHealthScript serving the needs of 
Node labels.
# Earlier target version was mentioned as 2.6, do we modify it to 2.7 or 2.8?



 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329399#comment-14329399
 ] 

Wangda Tan commented on YARN-2495:
--

For your questions:
1. It's not first priority, let's just keep it here and move on with the script 
one
2. That's make sense, and could you take a look at the health check script, is 
it as same as you mentioned? (I guess so)
3. I can remember that :), make sense to me. If there's any merge needed, we 
can tackle it in a separated JIRA.
4. I just marked to 2.8, since 2.7 is planned to be released within 2 weeks, I 
think it will be hard to fit in that timeframe.

IIRC, this JIRA contains a abstract node label provider only. I suggest don't 
add more options for configuring node label provider for now (saying, set 
provider=script-based-provider), to avoid potentially unnecessary switches.

For now, as we discussed, you can leave an empty provider implementation (but 
need proper impl to do tests), script-based-provider can be addressed 
separately.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329425#comment-14329425
 ] 

Wangda Tan commented on YARN-3197:
--

I think non-alive container is not correct, since all completed containers 
reported from NM are non-alive, I suggest better saying {{containerId=xx 
completed with status=yyy from completed or unknown application id=zzz}}.

And I suggest to improve following log a little bit as I suggested:
https://issues.apache.org/jira/browse/YARN-3197?focusedCommentId=14326344page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14326344

{{If a RM can get RMContainer, the application will definitely not unknown, 
should indicate the application may be completed as well.}}

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329856#comment-14329856
 ] 

Wangda Tan commented on YARN-3131:
--

Hi [~lichangleo],
Thanks for working on this, some (minor) comments:
1) There're several duplicated calls of {{getApplicationReport}} in 
YarnClientImpl, you can merge them you a single one.
2) Changes of TestApplicationClientProtocolOnHA actually changed behavior of 
the test, it's better to fix the test.
3) Code style, I noticed tabs/spaces mixed using in the patch.

Wangda

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329921#comment-14329921
 ] 

Wangda Tan commented on YARN-2986:
--

Ver.1 patch uploaded to YARN-3233

 (Umbrella) Support hierarchical and unified scheduler configuration
 ---

 Key: YARN-2986
 URL: https://issues.apache.org/jira/browse/YARN-2986
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
 Attachments: YARN-2986.1.patch


 Today's scheduler configuration is fragmented and non-intuitive, and needs to 
 be improved. Details in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated

2015-02-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329431#comment-14329431
 ] 

Naganarasimha G R commented on YARN-2854:
-

Hi [~zjshen],
Based on earlier discussions you had mentioned that we need to target this jira 
by 2.7 and in community forum, discussions were happening that in a week or two 
we will be having 2.7 release. So can review this doc update so that we can 
have sufficient time for review rework ?

 The document about timeline service and generic service needs to be updated
 ---

 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
 YARN-2854.20150128.1.patch, timeline_structure.jpg






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329435#comment-14329435
 ] 

Varun Saxena commented on YARN-3197:


Hmm...Non-alive was to indicate that its not found in aliveContainers in 
SchedulerApplicationAttempt.

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-20 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329436#comment-14329436
 ] 

Marcelo Vanzin commented on YARN-2423:
--

Hey everybody,

Just wanted to point out that this bug is currently marked as a blocker for 
integration between Spark and the ATS. It would really great to avoid having to 
write our own REST client just to talk to the ATS, and if the same API can be 
used to support YARN-2928, even better.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
 Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
 YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
 YARN-2423.patch


 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-20 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329442#comment-14329442
 ] 

Li Lu commented on YARN-3166:
-

Hello guys, sorry for the late reply...

bq. RM and NM modules will depend on timeline service module?
I agree that for RM, this is almost unavoidable. I think it's reasonable to let 
RM/NMs (in future) depend on timeline services, but we need to be careful on 
cyclic dependencies? 

bq. What is the difference between TimelineStorage and TimelineStorageImpl?
We may need some renaming here. In my original thought, TimelineStorage class 
translates operations based on our object model to data storage layer method 
calls. These methods should be implemented by a TimelineStorageImpl object (and 
its subclasses, of course). 

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329441#comment-14329441
 ] 

Wangda Tan commented on YARN-3197:
--

If so, you should indicate this container cannot be found in aliveContainers of 
SchedulerApplicationAttempt to be more clear.

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >