[jira] [Resolved] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2015-05-11 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou resolved YARN-1231.
-
Resolution: Won't Fix

 YARN-2637 Has fixed the problem described in YARN-276.
So this ticket  needn't to be fixed anymore.

 Fix test cases that will hit max- am-used-resources-percent limit after 
 YARN-276
 

 Key: YARN-1231
 URL: https://issues.apache.org/jira/browse/YARN-1231
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou
Assignee: Nemon Lou
  Labels: test
 Attachments: YARN-1231.patch


 Use a separate jira to fix YARN's test cases that will fail by hitting max- 
 am-used-resources-percent limit after YARN-276.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API

2014-01-09 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867413#comment-13867413
 ] 

Nemon Lou commented on YARN-1033:
-

Thanks Karthik Kambatla .You are really efficient.
+1(non-binding)
Agree that HA state in JMX can be added later in another JIRA when needed.


 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1033) Expose RM active/standby state to web UI and metrics

2014-01-07 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated YARN-1033:


Assignee: Karthik Kambatla  (was: Nemon Lou)

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1033) Expose RM active/standby state to web UI and metrics

2014-01-07 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864948#comment-13864948
 ] 

Nemon Lou commented on YARN-1033:
-

Hi,Karthik Kambatla .Feel free to take it. : )
Thanks

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Nemon Lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2013-09-24 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated YARN-1231:


Attachment: YARN-1231.patch

A patch fixing test cases in hadoop-yarn-server-resourcemanager project.

 Fix test cases that will hit max- am-used-resources-percent limit after 
 YARN-276
 

 Key: YARN-1231
 URL: https://issues.apache.org/jira/browse/YARN-1231
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou
Assignee: Nemon Lou
  Labels: test
 Attachments: YARN-1231.patch


 Use a separate jira to fix YARN's test cases that will fail by hitting max- 
 am-used-resources-percent limit after YARN-276.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276

2013-09-23 Thread Nemon Lou (JIRA)
Nemon Lou created YARN-1231:
---

 Summary: Fix test cases that will hit max- 
am-used-resources-percent limit after YARN-276
 Key: YARN-1231
 URL: https://issues.apache.org/jira/browse/YARN-1231
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou
Assignee: Nemon Lou


Use a separate jira to fix YARN's test cases that will fail by hitting max- 
am-used-resources-percent limit after YARN-276.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1196) LocalDirsHandlerService never change failedDirs back to normal even when these disks turn good

2013-09-13 Thread Nemon Lou (JIRA)
Nemon Lou created YARN-1196:
---

 Summary: LocalDirsHandlerService never change failedDirs back to 
normal even when these disks turn good
 Key: YARN-1196
 URL: https://issues.apache.org/jira/browse/YARN-1196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou


A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of one node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even after 
a long time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1196) LocalDirsHandlerService never change failedDirs back to normal even when these disks turn good

2013-09-13 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated YARN-1196:


Description: 
A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of the node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even after 
a long time.


  was:
A simple way to reproduce it:
1,change access mode of one node manager's local-dirs to 000
After a few seconds,this node manager will become unhealthy.
2,change access mode of one node manager's local-dirs back to normal.
The node manager is still unhealthy with all local-dirs in bad state even after 
a long time.



 LocalDirsHandlerService never change failedDirs back to normal even when 
 these disks turn good
 --

 Key: YARN-1196
 URL: https://issues.apache.org/jira/browse/YARN-1196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Nemon Lou

 A simple way to reproduce it:
 1,change access mode of one node manager's local-dirs to 000
 After a few seconds,this node manager will become unhealthy.
 2,change access mode of the node manager's local-dirs back to normal.
 The node manager is still unhealthy with all local-dirs in bad state even 
 after a long time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-938) Hadoop 2 benchmarking

2013-09-10 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763838#comment-13763838
 ] 

Nemon Lou commented on YARN-938:


Thanks Mayank Bansal for your work.Do you mind sharing how much input data do 
you run for TeraSort?

 Hadoop 2 benchmarking 
 --

 Key: YARN-938
 URL: https://issues.apache.org/jira/browse/YARN-938
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: Hadoop-benchmarking-2.x-vs-1.x.xls


 I am running the benchmarks on Hadoop 2 and will update the results soon.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE

2013-09-08 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761600#comment-13761600
 ] 

Nemon Lou commented on YARN-842:


Following Harsh J's advise,a different error occurs :

{code}

user agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; 
SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center 
PC 6.0; aff-kingsoft-ciba)
timestamp: Mon, 9 Sep 2013 03:45:49 UTC


message: Object doesn't support this property or method
line: 652
char: 21
code: 0
URI: http://158.1.131.13:8088/static/jt/jquery.jstree.js
{code}

Any suggestions?


 Resource Manager  Node Manager UI's doesn't work with IE
 -

 Key: YARN-842
 URL: https://issues.apache.org/jira/browse/YARN-842
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 {code:xml}
 Webpage error details
 User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; 
 SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
 Center PC 6.0)
 Timestamp: Mon, 17 Jun 2013 12:06:03 UTC
 Message: 'JSON' is undefined
 Line: 41
 Char: 218
 Code: 0
 URI: http://10.18.40.24:8088/cluster/apps
 {code}
 RM  NM UI's are not working with IE and showing the above error for every 
 link on the UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-26 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750861#comment-13750861
 ] 

Nemon Lou commented on YARN-292:


I will try to post my test result after applying this patch when i have time. 
No idea about the test case part.

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen
 Attachments: YARN-292.1.patch, YARN-292.2.patch, YARN-292.3.patch


 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-16 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742021#comment-13742021
 ] 

nemon lou commented on YARN-292:


Thanks Zhijie Shen for your update.Do you plan to add some test cases for it? I 
think the test part will be the most difficult one.

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen
 Attachments: YARN-292.1.patch, YARN-292.2.patch, YARN-292.3.patch


 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-08-15 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740880#comment-13740880
 ] 

nemon lou commented on YARN-292:


FIFO Scheduler uses TreeMap to keep applications in FIFO 
order,ConcurrentHashMap will break this featrue. Right?

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen
 Attachments: YARN-292.1.patch, YARN-292.2.patch


 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1027) Implement RMHAServiceProtocol

2013-08-06 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1027:


Assignee: Karthik Kambatla  (was: nemon lou)

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1033) Expose RM active/standby state to web UI and metrics

2013-08-06 Thread nemon lou (JIRA)
nemon lou created YARN-1033:
---

 Summary: Expose RM active/standby state to web UI and metrics
 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: nemon lou


Both active and standby RM shall expose it's web server and show it's current 
state (active or standby) on web page.
Cluster metrics also need this state for monitor.
RM web services shall refuse client request unless querying for RM state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1033) Expose RM active/standby state to web UI and metrics

2013-08-06 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1033:


Assignee: nemon lou

 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: nemon lou
Assignee: nemon lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 RM web services shall refuse client request unless querying for RM state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1033) Expose RM active/standby state to web UI and metrics

2013-08-06 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1033:


Description: 
Both active and standby RM shall expose it's web server and show it's current 
state (active or standby) on web page.
Cluster metrics also need this state for monitor.
Standby RM web services shall refuse client request unless querying for RM 
state.

  was:
Both active and standby RM shall expose it's web server and show it's current 
state (active or standby) on web page.
Cluster metrics also need this state for monitor.
RM web services shall refuse client request unless querying for RM state.


 Expose RM active/standby state to web UI and metrics
 

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: nemon lou
Assignee: nemon lou

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-05 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729348#comment-13729348
 ] 

nemon lou commented on YARN-1027:
-

I have also started working on this since it was in unassigned.
It's ok to take it up,i will review the patch :)


 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: nemon lou

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1027) Implement RMHAServiceProtocol

2013-08-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-1027:


Assignee: nemon lou

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: nemon lou

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-06-05 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
  Labels: incompatible
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-764) blank Used Resources on Capacity Scheduler page

2013-06-05 Thread nemon lou (JIRA)
nemon lou created YARN-764:
--

 Summary: blank Used Resources on Capacity Scheduler page 
 Key: YARN-764
 URL: https://issues.apache.org/jira/browse/YARN-764
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: nemon lou
Assignee: nemon lou


Even when there are jobs running,used resources is empty on Capacity Scheduler 
page for leaf queue.(I use google-chrome on windows 7.)
After changing resource.java's toString method by replacing  with {},this 
bug gets fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-764) blank Used Resources on Capacity Scheduler page

2013-06-05 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-764:
---

Attachment: YARN-764.patch

No test case added since it's only a symbol change in toString()

 blank Used Resources on Capacity Scheduler page 
 

 Key: YARN-764
 URL: https://issues.apache.org/jira/browse/YARN-764
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-764.patch


 Even when there are jobs running,used resources is empty on Capacity 
 Scheduler page for leaf queue.(I use google-chrome on windows 7.)
 After changing resource.java's toString method by replacing  with 
 {},this bug gets fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-764) blank Used Resources on Capacity Scheduler page

2013-06-05 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-764:
---

Attachment: YARN-764.patch

Changing patch as Thomas suggested,escape HTML is a better way definitely

 blank Used Resources on Capacity Scheduler page 
 

 Key: YARN-764
 URL: https://issues.apache.org/jira/browse/YARN-764
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-764.patch, YARN-764.patch


 Even when there are jobs running,used resources is empty on Capacity 
 Scheduler page for leaf queue.(I use google-chrome on windows 7.)
 After changing resource.java's toString method by replacing  with 
 {},this bug gets fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-606) negative queue metrics apps Failed

2013-05-01 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-606:
---

Assignee: nemon lou

 negative  queue metrics apps Failed
 -

 Key: YARN-606
 URL: https://issues.apache.org/jira/browse/YARN-606
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor

 Queue metrcis apps Failed can be negative in some cases(more than one 
 attempt for an application can cause this).
 It's confusing if we use this metrics directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart

2013-04-24 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640150#comment-13640150
 ] 

nemon lou commented on YARN-513:


What about admin client? refreshQueues,refreshNodes,etc.
These will be needed in HA.

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-606) negative queue metrcis apps Failed

2013-04-24 Thread nemon lou (JIRA)
nemon lou created YARN-606:
--

 Summary: negative  queue metrcis apps Failed
 Key: YARN-606
 URL: https://issues.apache.org/jira/browse/YARN-606
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor


Queue metrcis apps Failed can be negative in some cases(more than one attempt 
for an application can cause this).
It's confusing if we use this metrics directly.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-606) negative queue metrcis apps Failed

2013-04-24 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640214#comment-13640214
 ] 

nemon lou commented on YARN-606:


The submitApp() method in QueueMetrcis.java cause negative ,it has this logic:
 public void submitApp(String user, int attemptId) {
if (attemptId == 1) {
  appsSubmitted.incr();
} else {
  appsFailed.decr();
}
   ...
  }

Which is introduced in by MAPREDUCE-3870.

 negative  queue metrcis apps Failed
 -

 Key: YARN-606
 URL: https://issues.apache.org/jira/browse/YARN-606
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor

 Queue metrcis apps Failed can be negative in some cases(more than one 
 attempt for an application can cause this).
 It's confusing if we use this metrics directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-606) negative queue metrics apps Failed

2013-04-24 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-606:
---

Summary: negative  queue metrics apps Failed  (was: negative  queue 
metrcis apps Failed)

 negative  queue metrics apps Failed
 -

 Key: YARN-606
 URL: https://issues.apache.org/jira/browse/YARN-606
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor

 Queue metrcis apps Failed can be negative in some cases(more than one 
 attempt for an application can cause this).
 It's confusing if we use this metrics directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-23 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-23 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Labels: incompatible  (was: )

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
  Labels: incompatible
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-15 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-15 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml

2013-04-15 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou resolved YARN-20.
---

Resolution: Won't Fix

 More information for yarn.resourcemanager.webapp.address in yarn-default.xml
 --

 Key: YARN-20
 URL: https://issues.apache.org/jira/browse/YARN-20
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: nemon lou
Priority: Trivial
 Attachments: YARN-20.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

   The parameter  yarn.resourcemanager.webapp.address in yarn-default.xml  is 
 in host:port format,which is noted in the cluster set up guide 
 (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html).
   When i read though the code,i find host format is also supported. In 
 host format,the port will be random.
   So we may add more documentation in  yarn-default.xml for easy understood.
   I will submit a patch if it's helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-12 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630939#comment-13630939
 ] 

nemon lou commented on YARN-276:


[~tgraves]
Here is the initial thoughts on checking cluster level  AM resource percent in 
each leafqueue:
Leaf queue'capacity is computed based on absoluteMaxCapacity. 
Considering we have 10 leaf queues,each with a value of 100% 
absoluteMaxCapacity and 10% maxAMResourcePerQueuePercent configured,
there is still a chance that all leaf queue's resources taken up by AM before 
reaching the 10% maxAMResourcePerQueuePercent limit.

Note that a cluster basis' am resource percent only works in leaf queue if no 
am resource percent configured for this leaf queue.

As Thomas Graves mentioned,cluster level checking will causing one queue 
restrict another.I will remove cluster level checking.






 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-12 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

uploading a interim patch.

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-04-01 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Patch rebased.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-03-28 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617008#comment-13617008
 ] 

nemon lou commented on YARN-276:


[~zjshen]
Yes,a dynamic maxActiveApplications will work ,too.And no need adding any new 
criteria .I'll give it a try .
Thanks.


 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)
nemon lou created YARN-447:
--

 Summary: applicationComparator improvement for CS
 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch

Now the compare code is :
return a1.getApplicationId().getId() - a2.getApplicationId().getId();

Will be replaced with :
return a1.getApplicationId().compareTo(a2.getApplicationId());

This will bring some benefits:
1,leave applicationId compare logic to ApplicationId class;
2,In future's HA mode,cluster time stamp may change,ApplicationId class already 
takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Attaching a simple patch with a test case.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Use real applicationId instead of mock one in TestUtil.So applicationId's 
compareTo method will do its work

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Adding a timeout

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593135#comment-13593135
 ] 

nemon lou commented on YARN-447:


This patch is ready for review now.
Thank you.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-111) Application level priority in Resource Manager Schedulers

2013-02-25 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou resolved YARN-111.


Resolution: Won't Fix

 Application level priority in Resource Manager Schedulers
 -

 Key: YARN-111
 URL: https://issues.apache.org/jira/browse/YARN-111
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 We need application level priority for Hadoop 2.0,both in FIFO scheduler and 
 Capacity Scheduler.
 In Hadoop 1.0.x,job priority is supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml

2013-02-06 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-20:
--

Attachment: YARN-20.patch

Adding annotation just as Harsh J said.Sorry for comming back so late.No test 
case is added since it's only a trivial document change.

 More information for yarn.resourcemanager.webapp.address in yarn-default.xml
 --

 Key: YARN-20
 URL: https://issues.apache.org/jira/browse/YARN-20
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: nemon lou
Priority: Trivial
 Attachments: YARN-20.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

   The parameter  yarn.resourcemanager.webapp.address in yarn-default.xml  is 
 in host:port format,which is noted in the cluster set up guide 
 (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html).
   When i read though the code,i find host format is also supported. In 
 host format,the port will be random.
   So we may add more documentation in  yarn-default.xml for easy understood.
   I will submit a patch if it's helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-111) Application level priority in Resource Manager Schedulers

2013-02-06 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573142#comment-13573142
 ] 

nemon lou commented on YARN-111:


Finally i use two Queues in Capacity Scheduler to basically meet our needs.
Both queue has a Absolute Max Capacity of 100% .The queue with higher priority 
has more Absolute Capacity configured(85%).
Job which need high priority will be submitted to the queue which has more 
Absolute Capacity configured.

 Application level priority in Resource Manager Schedulers
 -

 Key: YARN-111
 URL: https://issues.apache.org/jira/browse/YARN-111
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 We need application level priority for Hadoop 2.0,both in FIFO scheduler and 
 Capacity Scheduler.
 In Hadoop 1.0.x,job priority is supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication

2013-02-06 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573165#comment-13573165
 ] 

nemon lou commented on YARN-374:


Thanks for the information.
But why not have one more API like gracefullyKillApplication(or just change 
force kill's behavior).
With this method,RM will ask AM to kill the app itself,
a force kill will be triggered if AM haven't killed itself during some period.

 Job History Server doesn't show jobs which killed by 
 ClientRMProtocol.forceKillApplication
 --

 Key: YARN-374
 URL: https://issues.apache.org/jira/browse/YARN-374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 After i kill a app by typing bin/yarn rmadmin app -kill APP_ID,
 no job info is kept on JHS web page.
 However, when i kill a job by typing  bin/mapred  job -kill JOB_ID ,
 i can see a killed job left on JHS.
 Some hive users are confused by that their jobs been killed but nothing left 
 on JHS ,and killed app's info on RM web page is not enough.(They kill job by 
 clientRMProtocol)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication

2013-02-06 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573238#comment-13573238
 ] 

nemon lou commented on YARN-374:


Agree that YARN-321 will help.

 Job History Server doesn't show jobs which killed by 
 ClientRMProtocol.forceKillApplication
 --

 Key: YARN-374
 URL: https://issues.apache.org/jira/browse/YARN-374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 After i kill a app by typing bin/yarn rmadmin app -kill APP_ID,
 no job info is kept on JHS web page.
 However, when i kill a job by typing  bin/mapred  job -kill JOB_ID ,
 i can see a killed job left on JHS.
 Some hive users are confused by that their jobs been killed but nothing left 
 on JHS ,and killed app's info on RM web page is not enough.(They kill job by 
 clientRMProtocol)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication

2013-02-02 Thread nemon lou (JIRA)
nemon lou created YARN-374:
--

 Summary: Job History Server doesn't show jobs which killed by 
ClientRMProtocol.forceKillApplication
 Key: YARN-374
 URL: https://issues.apache.org/jira/browse/YARN-374
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nemon lou


After i kill a app by typing bin/yarn rmadmin app -kill APP_ID,
no job info is kept on JHS web page.
However, when i kill a job by typing  bin/mapred  job -kill JOB_ID ,
i can see a killed job left on JHS.
Some hive users are confused by that their jobs been killed but nothing left on 
JHS ,and killed app's info on RM web page is not enough.(They kill job by 
clientRMProtocol)




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication

2013-02-02 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-374:
---

  Component/s: resourcemanager
   client
Affects Version/s: 2.0.1-alpha

The difference between bin/yarn and bin/mapred is that one use clientRMProtocol 
sending request to RM and the other use MRClientProtocol sending request to AM.

 Job History Server doesn't show jobs which killed by 
 ClientRMProtocol.forceKillApplication
 --

 Key: YARN-374
 URL: https://issues.apache.org/jira/browse/YARN-374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 After i kill a app by typing bin/yarn rmadmin app -kill APP_ID,
 no job info is kept on JHS web page.
 However, when i kill a job by typing  bin/mapred  job -kill JOB_ID ,
 i can see a killed job left on JHS.
 Some hive users are confused by that their jobs been killed but nothing left 
 on JHS ,and killed app's info on RM web page is not enough.(They kill job by 
 clientRMProtocol)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-24 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539260#comment-13539260
 ] 

nemon lou commented on YARN-276:


updating the patch.
Four properties have been added to CS web page:
Max AM Used Per Queue Percent
Actual AM Used Per Queue Percent
Max AM Used Percent For Cluster
Actual AM Used Percent For Cluster

This patch keeps track of AM used resources and checks for it both at cluster 
level and leaf Queue level.

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-20 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537643#comment-13537643
 ] 

nemon lou commented on YARN-276:


Good idea ,Robert.Thank you for your comment.
I think it's good to display  AM used resources and AM percent limit(or max 
resources that AMs can use) for each leaf queue on capacity scheduler page.



 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-19 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-19 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535809#comment-13535809
 ] 

nemon lou commented on YARN-276:


All YARN and MR 's tests passed on my own cluster.So Submit Patch again.

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-19 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-19 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536818#comment-13536818
 ] 

nemon lou commented on YARN-276:


This patch is ready for review now.Thank you.

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-17 Thread nemon lou (JIRA)
nemon lou created YARN-276:
--

 Summary: Capacity Scheduler can hang when submit many jobs 
concurrently
 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.1-alpha, 3.0.0
Reporter: nemon lou


In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
scheduler can hang with most resources taken up by AM and don't have enough 
resources for tasks.And then all applications hang there.
The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
check directly.Instead ,this property only used for maxActiveApplications. And 
maxActiveApplications is computed by minimumAllocation (not by Am actually 
used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2012-12-17 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
 Attachments: YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-111) Application level priority in Resource Manager Schedulers

2012-09-20 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459420#comment-13459420
 ] 

nemon lou commented on YARN-111:


to Harsh J
 I have looked into the code.
 CS's LeafQueue keeps active applications and pending applacations in TreeSet.
 TreeSet's comparator comes from CapacityScheduler's applicationComparator . 
 ApplicationComparator's compare method is like this:
   return a1.getApplicationId().getId() - a2.getApplicationId().getId();
(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.java
 line 106)
 So application's priority doesn't take effect.Am i right?

to Robert
  I'm not sure whether job priority is removed or not in recent MR1 code.But it 
will be very nice of you to add application level priority in YARN. :)



 Application level priority in Resource Manager Schedulers
 -

 Key: YARN-111
 URL: https://issues.apache.org/jira/browse/YARN-111
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 We need application level priority for Hadoop 2.0,both in FIFO scheduler and 
 Capacity Scheduler.
 In Hadoop 1.0.x,job priority is supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-111) Application level priority in Resource Manager Schedulers

2012-09-19 Thread nemon lou (JIRA)
nemon lou created YARN-111:
--

 Summary: Application level priority in Resource Manager Schedulers
 Key: YARN-111
 URL: https://issues.apache.org/jira/browse/YARN-111
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.1-alpha
Reporter: nemon lou


We need application level priority for Hadoop 2.0,both in FIFO scheduler and 
Capacity Scheduler.
In Hadoop 1.0.x,job priority is supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-77) Test case TestAMRMRPCNodeUpdates.testAMRMUnusableNodes fails occasionally

2012-09-02 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447104#comment-13447104
 ] 

nemon lou commented on YARN-77:
---

In other words ,method syncNodeHeartbeat doesn't work synchronously. 
DrainDispatcher's await() method has return before queue becomes empty.

 Test case TestAMRMRPCNodeUpdates.testAMRMUnusableNodes fails occasionally
 -

 Key: YARN-77
 URL: https://issues.apache.org/jira/browse/YARN-77
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-alpha
 Environment: Linux 2.6.32.12-0.7-default  x86_64
 java version 1.6.0_26 Java HotSpot(TM) 64-Bit Server VM
Reporter: nemon lou
 Attachments: TestAMRMRPCNodeUpdates_output.TXT


 Test case 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes
  fails occasionally in my entironment.
 Here is the error message.Standard output will be uploaded in a file later.
 Error Message
 expected:1 but was:0Stacktrace
 junit.framework.AssertionFailedError: expected:1 but was:0
   at junit.framework.Assert.fail(Assert.java:47)
   at junit.framework.Assert.failNotEquals(Assert.java:283)
   at junit.framework.Assert.assertEquals(Assert.java:64)
   at junit.framework.Assert.assertEquals(Assert.java:195)
   at junit.framework.Assert.assertEquals(Assert.java:201)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:123)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
   at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
   at 
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-77) Test case TestAMRMRPCNodeUpdates.testAMRMUnusableNodes fails occasionally

2012-09-01 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-77:
--

Attachment: TestAMRMRPCNodeUpdates_output.TXT

 Test case TestAMRMRPCNodeUpdates.testAMRMUnusableNodes fails occasionally
 -

 Key: YARN-77
 URL: https://issues.apache.org/jira/browse/YARN-77
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-alpha
 Environment: Linux 2.6.32.12-0.7-default  x86_64
 java version 1.6.0_26 Java HotSpot(TM) 64-Bit Server VM
Reporter: nemon lou
 Attachments: TestAMRMRPCNodeUpdates_output.TXT


 Test case 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes
  fails occasionally in my entironment.
 Here is the error message.Standard output will be uploaded in a file later.
 Error Message
 expected:1 but was:0Stacktrace
 junit.framework.AssertionFailedError: expected:1 but was:0
   at junit.framework.Assert.fail(Assert.java:47)
   at junit.framework.Assert.failNotEquals(Assert.java:283)
   at junit.framework.Assert.assertEquals(Assert.java:64)
   at junit.framework.Assert.assertEquals(Assert.java:195)
   at junit.framework.Assert.assertEquals(Assert.java:201)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:123)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
   at 
 org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
   at 
 org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira