[jira] [Created] (YARN-1447) Common PB types define for container resource change

2013-11-25 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1447:


 Summary: Common PB types define for container resource change
 Key: YARN-1447
 URL: https://issues.apache.org/jira/browse/YARN-1447
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.2.0
Reporter: Wangda Tan
Assignee: Wangda Tan


As described in YARN-1197, we need add some common PB types for container 
resource change, like ResourceChangeContext, etc. These types will be both used 
by RM/NM protocols



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1449) Protocol changes in NM side to support change container resource

2013-11-25 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1449:


 Summary: Protocol changes in NM side to support change container 
resource
 Key: YARN-1449
 URL: https://issues.apache.org/jira/browse/YARN-1449
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Wangda Tan
Assignee: Wangda Tan


As described in YARN-1197, we need add API in NM to support
1) Add a changeContainersResources method in ContainerManagementProtocol
2) Can get succeed/failed increased/decreased containers in response of 
changeContainersResources
3) Add a new decreased containers field in NodeStatus which can help NM 
notify RM such changes



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1502) Protocol changes and implementations in RM side to support change container resource

2013-12-12 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1502:


 Summary: Protocol changes and implementations in RM side to 
support change container resource
 Key: YARN-1502
 URL: https://issues.apache.org/jira/browse/YARN-1502
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan


As described in YARN-1197, we need add API/implementation changes,
1) Add a ListContainerResourceIncreaseRequest to YarnScheduler interface
2) Can get resource changed containers in AllocateResponse
3) Added implementation in Capacity Scheduler side to support increase/decrease

Other details, please refer to design doc and discussion in YARN-1197



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2013-12-16 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1509:


 Summary: Make AMRMClient support send increase container request 
and get increased/decreased containers
 Key: YARN-1509
 URL: https://issues.apache.org/jira/browse/YARN-1509
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


As described in YARN-1197, we need add API in AMRMClient to support
1) Add increase request
2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1510) Make NMClient support change container resources

2013-12-16 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1510:


 Summary: Make NMClient support change container resources
 Key: YARN-1510
 URL: https://issues.apache.org/jira/browse/YARN-1510
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


As described in YARN-1197, YARN-1449, we need add API in NMClient to support
1) sending request of increase/decrease container resource limits
2) get succeeded/failed changed containers response from NM.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1609) Add Service Container type to NodeManager in YARN

2014-01-15 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1609:


 Summary: Add Service Container type to NodeManager in YARN
 Key: YARN-1609
 URL: https://issues.apache.org/jira/browse/YARN-1609
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Wangda Tan
Assignee: Wangda Tan


From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found 
that it’s important to have framework specific daemon process manage the tasks 
on each node directly. The daemon process, most likely similar in other 
frameworks as well, provides critical services to tasks running on that 
node(for example “wireup”, spawn user process in large numbers at once etc). 
In YARN, it’s hard, if not possible, to have the those processes to be managed 
by YARN. 

We propose to extend the container model on NodeManager side to support 
“Service Container” to run/manage such framework daemon/services process. We 
believe this is very useful to other application framework developers as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1644) [YARN-1197] Add newly decreased container to NodeStatus in NM side

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1644:


 Summary: [YARN-1197] Add newly decreased container to NodeStatus 
in NM side
 Key: YARN-1644
 URL: https://issues.apache.org/jira/browse/YARN-1644
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1643) [YARN-1197] Make ContainersMonitor can support change monitoring size of an allocated container in NM side

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1643:


 Summary: [YARN-1197] Make ContainersMonitor can support change 
monitoring size of an allocated container in NM side
 Key: YARN-1643
 URL: https://issues.apache.org/jira/browse/YARN-1643
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1648) [YARN-1197] Modify ApplicationMasterService to support changing container resource

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1648:


 Summary: [YARN-1197] Modify ApplicationMasterService to support 
changing container resource
 Key: YARN-1648
 URL: https://issues.apache.org/jira/browse/YARN-1648
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1651) [YARN-1197] Add methods in FiCaSchedulerApp to support add/reserve/unreserve/allocate/pull change container requests/results

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1651:


 Summary: [YARN-1197] Add methods in FiCaSchedulerApp to support 
add/reserve/unreserve/allocate/pull change container requests/results
 Key: YARN-1651
 URL: https://issues.apache.org/jira/browse/YARN-1651
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1650) [YARN-1197] Add pullDecreasedContainer API to RMNode which can be used by scheduler to get newly decreased Containers

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1650:


 Summary: [YARN-1197] Add pullDecreasedContainer API to RMNode 
which can be used by scheduler to get newly decreased Containers
 Key: YARN-1650
 URL: https://issues.apache.org/jira/browse/YARN-1650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1646) [YARN-1197] Add increase container request to YarnScheduler allocate API

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1646:


 Summary: [YARN-1197] Add increase container request to 
YarnScheduler allocate API
 Key: YARN-1646
 URL: https://issues.apache.org/jira/browse/YARN-1646
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1649) [YARN-1197] Modify ResourceTrackerService to support passing decreased containers to RMNode

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1649:


 Summary: [YARN-1197] Modify ResourceTrackerService to support 
passing decreased containers to RMNode
 Key: YARN-1649
 URL: https://issues.apache.org/jira/browse/YARN-1649
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1647) [YARN-1197] Add increased/decreased container to Allocation

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1647:


 Summary: [YARN-1197] Add increased/decreased container to 
Allocation
 Key: YARN-1647
 URL: https://issues.apache.org/jira/browse/YARN-1647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1654) [YARN-1197] Add implementations to CapacityScheduler to support increase/decrease container resource

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1654:


 Summary: [YARN-1197] Add implementations to CapacityScheduler to 
support increase/decrease container resource
 Key: YARN-1654
 URL: https://issues.apache.org/jira/browse/YARN-1654
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1652) [YARN-1197] Add methods in FiCaSchedulerNode to support increase/decrease/reserve/unreserve change container requests/results

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1652:


 Summary: [YARN-1197] Add methods in FiCaSchedulerNode to support 
increase/decrease/reserve/unreserve change container requests/results
 Key: YARN-1652
 URL: https://issues.apache.org/jira/browse/YARN-1652
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1655) [YARN-1197] Add implementations to FairScheduler to support increase/decrease container resource

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1655:


 Summary: [YARN-1197] Add implementations to FairScheduler to 
support increase/decrease container resource
 Key: YARN-1655
 URL: https://issues.apache.org/jira/browse/YARN-1655
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1653) [YARN-1197] Add APIs in CSQueue to support decrease container resource and unreserve increase request

2014-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1653:


 Summary: [YARN-1197] Add APIs in CSQueue to support decrease 
container resource and unreserve increase request
 Key: YARN-1653
 URL: https://issues.apache.org/jira/browse/YARN-1653
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1871) We should eliminate writing *PBImpl code in YARN

2014-03-25 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1871:


 Summary: We should eliminate writing *PBImpl code in YARN
 Key: YARN-1871
 URL: https://issues.apache.org/jira/browse/YARN-1871
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan



Currently, We need write PBImpl classes one by one. After running find . -name 
*PBImpl*.java | xargs wc -l under hadoop source code directory, we can see, 
there're more than 25,000 LOC. I think we should improve this, which will be 
very helpful for YARN developers to make changes for YARN protocols.


There're only some limited patterns in current *PBImpl,
* Simple types, like string, int32, float.
* List? types
* Map? types
* Enum types
Code generation should be enough to generate such PBImpl classes.

Some other requirements are,
* Leave other related code alone, like service implemention (e.g. 
ContainerManagerImpl).
* (If possible) Forward compatibility, developpers can write their own PBImpl 
or genereate them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1917) Add waitForCompletion interface to YarnClient

2014-04-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1917:


 Summary: Add waitForCompletion interface to YarnClient
 Key: YARN-1917
 URL: https://issues.apache.org/jira/browse/YARN-1917
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.4.0
Reporter: Wangda Tan


Currently, YARN dosen't have this method. Users needs to write implementations 
like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob 
on their own. This feature should be helpful to end users.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1927) Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy

2014-04-10 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1927:


 Summary: Preemption message shouldn’t be created multiple times 
for same container-id in ProportionalCapacityPreemptionPolicy
 Key: YARN-1927
 URL: https://issues.apache.org/jira/browse/YARN-1927
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor


Currently, after each editSchedule() called, preemption message will be created 
and sent to scheduler. ProportionalCapacityPreemptionPolicy should only send 
preemption message once for each container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-05-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2104:


 Summary: Scheduler queue filter failed to work because index of 
queue column changed
 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan


YARN-563 added,
{code}
+ th(.type, Application Type”).
{code}
to application table, which makes queue’s column index from 3 to 4. And in 
scheduler page, queue’s column index is hard coded to 3 when filter application 
with queue’s name,
{code}
  if (q == 'root') q = '';,
  else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
  $('#apps').dataTable().fnFilter(q, 3, true);,
{code}
So queue filter will not work for application page.

Reproduce steps: (Thanks Bo Yang for pointing this)
{code}
1) In default setup, there’s a default queue under root queue
2) Run an arbitrary application, you can find it in “Applications” page
3) Click “Default” queue in scheduler page
4) Click “Applications”, no application will show here
5) Click “Root” queue in scheduler page
6) Click “Applications”, application will show again
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized

2014-06-05 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2124:


 Summary: ProportionalCapacityPreemptionPolicy cannot work because 
it's initialized before scheduler initialized
 Key: YARN-2124
 URL: https://issues.apache.org/jira/browse/YARN-2124
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical


When I play with scheduler with preemption, I found 
ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM 
start
{code}
2014-06-05 11:01:33,201 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw 
an Exception.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
at java.lang.Thread.run(Thread.java:744)
{code}

This is caused by ProportionalCapacityPreemptionPolicy needs ResourceCalculator 
from CapacityScheduler. But ProportionalCapacityPreemptionPolicy get 
initialized before CapacityScheduler initialized. So ResourceCalculator will 
set to null in ProportionalCapacityPreemptionPolicy. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled

2014-06-05 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2125:


 Summary: ProportionalCapacityPreemptionPolicy should only log CSV 
when debug enabled
 Key: YARN-2125
 URL: https://issues.apache.org/jira/browse/YARN-2125
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2125.patch

Currently, logToCSV() will be output using LOG.info() in 
ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable 
texts in resource manager's log every several seconds, like
{code}
...
2014-06-05 15:57:07,603 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
3072, 2, 3072, 2, 0, 0, 0, 0
2014-06-05 15:57:10,603 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
3072, 2, 3072, 2, 0, 0, 0, 0
...
{code}

It's better to make it output when debug enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2143) Merge common killContainer logic of Fair/Capacity scheduler into AbstractYarnScheduler

2014-06-10 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2143:


 Summary: Merge common killContainer logic of Fair/Capacity 
scheduler into AbstractYarnScheduler
 Key: YARN-2143
 URL: https://issues.apache.org/jira/browse/YARN-2143
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager, scheduler
Reporter: Wangda Tan


Currently, CapacityScheduler has killContainer API inherited from 
PreemptableResourceScheduler, and FairScheduler uses warnOrKillContainer to do 
container preemption. We'd better to merge common code to kill container into 
AbstractYarnScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-11 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2148:


 Summary: TestNMClient failed due more exit code values added and 
passed to AM
 Key: YARN-2148
 URL: https://issues.apache.org/jira/browse/YARN-2148
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0
Reporter: Wangda Tan


Currently, TestNMClient will be failed in trunk, see 
https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
{code}

Test cases in TestNMClient uses following code to verify exit code of COMPLETED 
containers
{code}
  testGetContainerStatus(container, i, ContainerState.COMPLETE,
  Container killed by the ApplicationMaster., Arrays.asList(
  new Integer[] {137, 143, 0}));
{code}
But YARN-2091 added logic to make exit code reflecting the actual status, so 
exit code of the killed by ApplicationMaster will be -105,
{code}
  if (container.hasDefaultExitCode()) {
container.exitCode = exitEvent.getExitCode();
  }
{code}

We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2149) Test failed in TestRMAdminCLI

2014-06-11 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2149:


 Summary: Test failed in TestRMAdminCLI
 Key: YARN-2149
 URL: https://issues.apache.org/jira/browse/YARN-2149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Wangda Tan


I noticed there're two test failures in TestRMAdminCLI,
1) testHelp
https://builds.apache.org/job/PreCommit-YARN-Build/3959//testReport/org.apache.hadoop.yarn.client/TestRMAdminCLI/testHelp/
{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
at 
org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
{code}
This should be caused by --forceactive recently added to transitionToActive

2) testTransitionToActive
https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client/TestRMAdminCLI/testTransitionToActive/
{code}
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.remove(AbstractList.java:144)
at java.util.AbstractList$Itr.remove(AbstractList.java:360)
at java.util.AbstractCollection.remove(AbstractCollection.java:252)
at 
org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
at 
org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
{code}
This is caused by ArrayList doesn't implement remove interface



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2149) Test failed in TestRMAdminCLI

2014-06-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-2149.
--

Resolution: Duplicate
  Assignee: Wangda Tan

 Test failed in TestRMAdminCLI
 -

 Key: YARN-2149
 URL: https://issues.apache.org/jira/browse/YARN-2149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Wangda Tan
Assignee: Wangda Tan

 I noticed there're two test failures in TestRMAdminCLI,
 1) testHelp
 https://builds.apache.org/job/PreCommit-YARN-Build/3959//testReport/org.apache.hadoop.yarn.client/TestRMAdminCLI/testHelp/
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}
 This should be caused by --forceactive recently added to transitionToActive
 2) testTransitionToActive
 https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client/TestRMAdminCLI/testTransitionToActive/
 {code}
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 {code}
 This is caused by ArrayList doesn't implement remove interface



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2181) Add preemption info to RM Web UI

2014-06-18 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2181:


 Summary: Add preemption info to RM Web UI
 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan


We need add preemption info to RM web page to make administrator/user get more 
understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed

2014-06-22 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2191:


 Summary: Add a test to make sure NM will do application cleanup 
even if RM restarting happens before application completed
 Key: YARN-2191
 URL: https://issues.apache.org/jira/browse/YARN-2191
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan


In YARN-1885, there's a test in 
TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. But this is 
not enough, we need one more test to make sure NM will do app cleanup when 
restart happens before app finished. The sequence is,
1. Submit app1 to RM1
2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers.
3. Restart RM1
4. Before RM1 finishes restarting, container-0 completed in NM1
5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be 
completed
6. RM1 should be able to notify NM1/NM2 to cleanup app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2271) Add application attempt metrics to RM Web UI/service when AppAttempt page available

2014-07-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2271:


 Summary: Add application attempt metrics to RM Web UI/service when 
AppAttempt page available
 Key: YARN-2271
 URL: https://issues.apache.org/jira/browse/YARN-2271
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Wangda Tan


Currently, we only show application metrics in RM Web UI at application page 
(YARN-2181). An application attempt page is planed to add to RM Web UI. After 
that, we should add attempt metrics to that page and web service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches

2014-07-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-2258.
--

Resolution: Duplicate
  Assignee: Wangda Tan

 Aggregation of MR job logs failing when Resourcemanager switches
 

 Key: YARN-2258
 URL: https://issues.apache.org/jira/browse/YARN-2258
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Wangda Tan

 1.Install RM in HA mode
 2.Run a job with more tasks
 3.Induce RM switchover while job is in progress
 Observe that log aggregation fails for the job which is running when  
 Resourcemanager switchover is induced.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-17 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2308:


 Summary: NPE happened when RM restart after CapacityScheduler 
queue configuration changed 
 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan


I encountered a NPE when RM restart
{code}
2014-07-16 07:22:46,957 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ATTEMPT_ADDED to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
{code}
And RM will be failed to restart.

This is caused by queue configuration changed, I removed some queues and added 
new queues. So when RM restarts, it tries to recover history applications, and 
when any of queues of these applications removed, NPE will be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2492) [Umbrella] Allow for (admin) labels on nodes and resource-requests

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2492:


 Summary: [Umbrella] Allow for (admin) labels on nodes and 
resource-requests 
 Key: YARN-2492
 URL: https://issues.apache.org/jira/browse/YARN-2492
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, client, resourcemanager
Reporter: Wangda Tan


Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and track 
sub tasks and attach split patches for YARN-796.

Let's keep all over-all discussions on YARN-796.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2493) [YARN-796] API changes for users

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2493:


 Summary: [YARN-796] API changes for users
 Key: YARN-2493
 URL: https://issues.apache.org/jira/browse/YARN-2493
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan
Assignee: Wangda Tan


This JIRA includes API changes for users of YARN-796, like changes in 
{{ResourceRequest}}, {{ApplicationSubmissionContext}}, etc. This is a common 
part of YARN-796.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2494:


 Summary: [YARN-796] Node label manager API and storage 
implementations
 Key: YARN-2494
 URL: https://issues.apache.org/jira/browse/YARN-2494
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


This JIRA includes APIs and storage implementations of node label manager,
NodeLabelManager is an abstract class used to manage labels of nodes in the 
cluster, it has APIs to query/modify
- Nodes according to given label
- Labels according to given hostname
- Add/remove labels
- Set labels of nodes in the cluster
- Persist/recover changes of labels/labels-on-nodes to/from storage

And it has two implementations to store modifications
- Memory based storage: It will not persist changes, so all labels will be lost 
when RM restart
- FileSystem based storage: It will persist/recover to/from FileSystem (like 
HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2497) [YARN-796] Changes for fair scheduler to support allocate resource respect labels

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2497:


 Summary: [YARN-796] Changes for fair scheduler to support allocate 
resource respect labels
 Key: YARN-2497
 URL: https://issues.apache.org/jira/browse/YARN-2497
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2500:


 Summary: [YARN-796] Miscellaneous changes in ResourceManager to 
support labels
 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2499) [YARN-796] Respect labels in preemption policy of fair scheduler

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2499:


 Summary: [YARN-796] Respect labels in preemption policy of fair 
scheduler
 Key: YARN-2499
 URL: https://issues.apache.org/jira/browse/YARN-2499
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2503) [YARN-796] Changes in RM Web UI to better show labels to end users

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2503:


 Summary: [YARN-796] Changes in RM Web UI to better show labels to 
end users
 Key: YARN-2503
 URL: https://issues.apache.org/jira/browse/YARN-2503
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Include but not limited to:
- Show labels of nodes in RM/nodes page
- Show labels of queue in RM/scheduler page
- Warn user/admin if capacity of queue cannot be guaranteed according to mis 
config of labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2504) [YARN-796] Support get/add/remove/change labels in RM admin CLI

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2504:


 Summary: [YARN-796] Support get/add/remove/change labels in RM 
admin CLI 
 Key: YARN-2504
 URL: https://issues.apache.org/jira/browse/YARN-2504
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2505) [YARN-796] Support get/add/remove/change labels in RM REST API

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2505:


 Summary: [YARN-796] Support get/add/remove/change labels in RM 
REST API
 Key: YARN-2505
 URL: https://issues.apache.org/jira/browse/YARN-2505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Craig Welch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)

2014-09-12 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2544:


 Summary: [YARN-796] Common server side PB changes (not include 
user API PB changes)
 Key: YARN-2544
 URL: https://issues.apache.org/jira/browse/YARN-2544
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2637) maximum-am-resource-percent will be violated when resource of AM is minimumAllocation

2014-10-01 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2637:


 Summary: maximum-am-resource-percent will be violated when 
resource of AM is  minimumAllocation
 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Priority: Critical


Currently, number of AM in leaf queue will be calculated in following way:
{code}
max_am_resource = queue_max_capacity * maximum_am_resource_percent
#max_am_number = max_am_resource / minimum_allocation
#max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
{code}
And when submit new application to RM, it will check if an app can be activated 
in following way:
{code}
for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
 i.hasNext(); ) {
  FiCaSchedulerApp application = i.next();
  
  // Check queue limit
  if (getNumActiveApplications() = getMaximumActiveApplications()) {
break;
  }
  
  // Check user limit
  User user = getUser(application.getUser());
  if (user.getActiveApplications()  getMaximumActiveApplicationsPerUser()) 
{
user.activateApplication();
activeApplications.add(application);
i.remove();
LOG.info(Application  + application.getApplicationId() +
 from user:  + application.getUser() + 
 activated in queue:  + getQueueName());
  }
}
{code}

An example is,
If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
apps can still be activated, and it will occupy all resource of a queue instead 
of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue

2014-10-06 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2647:


 Summary: [YARN-796] Add yarn queue CLI to get queue info including 
labels of such queue
 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2665) Audit warning of registry project

2014-10-08 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2665:


 Summary: Audit warning of registry project
 Key: YARN-2665
 URL: https://issues.apache.org/jira/browse/YARN-2665
 Project: Hadoop YARN
  Issue Type: Bug
  Components: site
Reporter: Wangda Tan
Assignee: Steve Loughran
Priority: Minor


I encountered one audit warning today:

See:
https://issues.apache.org/jira/browse/YARN-2544?focusedCommentId=14164515page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14164515

It seems caused by recent committed registry project.

{code}
!? 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/resources/.keep
Lines that start with ? in the release audit report indicate files that do 
not have an Apache license header.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2685) Resource on each label not correct when multiple NMs in a same host and some has label some not

2014-10-13 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2685:


 Summary: Resource on each label not correct when multiple NMs in a 
same host and some has label some not
 Key: YARN-2685
 URL: https://issues.apache.org/jira/browse/YARN-2685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


I noticed there's one issue, when we have multiple NMs running in a same host, 
(say NM1-4 running in host1). And we specify some of them has label and some 
not, the total resource on label is not correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2694) Ensure only single node labels specified in resource request, and node label expression only specified when resourceName=ANY

2014-10-15 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2694:


 Summary: Ensure only single node labels specified in resource 
request, and node label expression only specified when resourceName=ANY
 Key: YARN-2694
 URL: https://issues.apache.org/jira/browse/YARN-2694
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently, node label expression supporting in capacity scheduler is partial 
completed. Now node label expression specified in Resource Request will only 
respected when it specified at ANY level. And a ResourceRequest with multiple 
node labels will make user limit computation becomes tricky.

Now we need temporarily disable them, changes include,
- AMRMClient
- ApplicationMasterService



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2695) Support continuously looking reserved container with node labels

2014-10-15 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2695:


 Summary: Support continuously looking reserved container with node 
labels
 Key: YARN-2695
 URL: https://issues.apache.org/jira/browse/YARN-2695
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


YARN-1769 improved capacity scheduler to continuously look at reserved 
container when trying to reserve/allocate resource.

This should be respected when node/resource-request/queue has node label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-15 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2698:


 Summary: Move getClusterNodeLabels and getNodeToLabels to YARN CLI 
instead of RMAdminCLI
 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


YARN RMAdminCLI and AdminService should have write API only, for other read 
APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2699) Fix test timeout in TestResourceTrackerOnHA#testResourceTrackerOnHA

2014-10-16 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2699:


 Summary: Fix test timeout in 
TestResourceTrackerOnHA#testResourceTrackerOnHA
 Key: YARN-2699
 URL: https://issues.apache.org/jira/browse/YARN-2699
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Wangda Tan


Because of changes by YARN-2500/YARN-2496/YARN-2494, now registering a node 
manager with port=0 is not allowed. 
TestResourceTrackerOnHA#testResourceTrackerOnHA will be failed since it 
register a node manager with port = 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2705) Changes of RM node label manager configuration

2014-10-17 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2705:


 Summary: Changes of RM node label manager configuration
 Key: YARN-2705
 URL: https://issues.apache.org/jira/browse/YARN-2705
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


1) Add yarn.node-labels.manager-class, by default it's will not store anything 
to file system
2) Use above at least in some places: RMNodeLabelsManager, RMAdminCLI. Convert 
{{DummyNodeLabelsManager}} into a {{MemoryNodeLabelsManager}}
3) Document that RM configs and client configs for 
yarn.node-labels.manager-class should match
4) fs-store.uri - fs-store.root-dir
5) Similarly FS_NODE_LABELS_STORE_URI
6) For default value of fs-store.uri, put it in /tmp. But creaat 
/tmp/hadoop-yarn-$\{user\}/node-labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2710) RM HA tests failed intermittently on trunk

2014-10-17 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2710:


 Summary: RM HA tests failed intermittently on trunk
 Key: YARN-2710
 URL: https://issues.apache.org/jira/browse/YARN-2710
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Wangda Tan


Failure like, it can be happened in TestApplicationClientProtocolOnHA, 
TestResourceTrackerOnHA, etc.
{code}
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
  Time elapsed: 9.491 sec   ERROR!
java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 to 
asf905.gq1.ygridcore.net:28032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
at 
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled

2014-10-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2740:


 Summary: RM AdminService should prevent admin change labels on 
nodes when distributed node label configuration enabled
 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


According to YARN-2495, labels of nodes will be specified when NM do heartbeat. 
We shouldn't allow admin modify labels on nodes when distributed node label 
configuration enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2778) YARN node CLI should display labels on returned node reports

2014-10-30 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2778:


 Summary: YARN node CLI should display labels on returned node 
reports
 Key: YARN-2778
 URL: https://issues.apache.org/jira/browse/YARN-2778
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2786) Create yarn node-labels CLI to enable list node labels collection and node labels mapping

2014-10-31 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2786:


 Summary: Create yarn node-labels CLI to enable list node labels 
collection and node labels mapping
 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


With YARN-2778, we can list node labels on existing RM nodes. But it is not 
enough, we should be able to: 
1) list node labels collection 
2) list node-to-label mappings even if the node hasn't registered to RM.

The command should start with yarn node-labels ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-02 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2800:


 Summary: Should print WARN log in both RM/RMAdminCLI side when 
MemoryRMNodeLabelsManager is enabled
 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Even though we have documented this, but it will be better to explicitly print 
a message in both RM/RMAdminCLI side to explicitly say that the node label 
being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2014-11-04 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2807:


 Summary: Option --forceactive not works as described in usage of 
yarn rmadmin -transitionToActive
 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan


Currently the help message of yarn rmadmin -transitionToActive is:
{code}
transitionToActive: incorrect number of arguments
Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
{code}
But the --forceactive not works as expected. When transition RM state with 
--forceactive:
{code}
yarn rmadmin -transitionToActive rm2 --forceactive
Automatic failover is enabled for 
org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the forcemanual flag.
{code}
As shown above, we still cannot transitionToActive when automatic failover is 
enabled with --forceactive.

The option can work is: {{--forcemanual}}, there's no place in usage describes 
this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2824) Capacity of labels should be zero by default

2014-11-06 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2824:


 Summary: Capacity of labels should be zero by default
 Key: YARN-2824
 URL: https://issues.apache.org/jira/browse/YARN-2824
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical


In existing Capacity Scheduler behavior, if user doesn't specify capacity of 
label, queue initialization will be failed. That will cause queue refreshment 
failed when add a new label to node labels collection and doesn't modify 
capacity-scheduler.xml.

With this patch, capacity of labels should be explicitly set if user want to 
use it. If user doesn't set capacity of some labels, we will treat such labels 
are unused labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2827) Fix bugs of yarn queue CLI

2014-11-06 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2827:


 Summary: Fix bugs of yarn queue CLI
 Key: YARN-2827
 URL: https://issues.apache.org/jira/browse/YARN-2827
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical


Need fix bugs:
1) args of queue CLI is without queue even if you run with yarn queue 
-status .., the args is [-status, ...]. The assumption is incorrect.
2) It is possible that there's no related QueueInfo with specified queue name, 
and null will be returned from YarnClient, so NPE will raise. Added a check for 
it, and will print proper message
3) When failed to get QueueInfo, should return non-zero exit code.
4) Add tests for above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2866) Capacity scheduler preemption policy should respect yarn.scheduler.minimum-allocation-mb when computing resource of queues

2014-11-14 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2866:


 Summary: Capacity scheduler preemption policy should respect 
yarn.scheduler.minimum-allocation-mb when computing resource of queues
 Key: YARN-2866
 URL: https://issues.apache.org/jira/browse/YARN-2866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan


Currently, capacity scheduler preemption logic doesn't respect 
minimum_allocation when computing ideal_assign/guaranteed_resource, etc. We 
should respect it to avoid some potential rounding issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2869) CapacityScheduler should trim sub queue names when parse configuration

2014-11-14 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2869:


 Summary: CapacityScheduler should trim sub queue names when parse 
configuration
 Key: YARN-2869
 URL: https://issues.apache.org/jira/browse/YARN-2869
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently, capacity scheduler doesn't trim sub queue name when parsing queue 
names, for example, the configuration

{code}
configuration
 property
 name...root.queues/name
 value a, b  , c/value
 /property

 property
 name...root.b.capacity/name
 value100/value
 /property
  
 ...
/property
{code}

Will fail with error: 
{code}
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for queue root. a 
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getCapacity(CapacitySchedulerConfiguration.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getCapacityFromConf(LeafQueue.java:196)

{code}

It will try to find a queues with name  a,  b  , and  c, which is 
apparently wrong, we should do trimming on these sub queue names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2014-11-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2880:


 Summary: Add a test in TestRMRestart to make sure node labels will 
be recovered if it is enabled
 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan


As suggested by [~ozawa], 
[link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
 We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2895) Integrate distributed scheduling with capacity scheduler

2014-11-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2895:


 Summary: Integrate distributed scheduling with capacity scheduler
 Key: YARN-2895
 URL: https://issues.apache.org/jira/browse/YARN-2895
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


There're some benefit to integrate distributed scheduling mechanism (LocalRM) 
with capacity scheduler:
- Resource usage of opportunistic container can be tracked by central RM and 
capacity could be enforced
- Opportunity to transfer opportunistic container to conservative container 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

2014-12-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2920:


 Summary: CapacityScheduler should be notified when labels on nodes 
changed
 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, 
but that is not enough upon labels on nodes changes:
- Scheduler should be able to do take actions to running containers. (Like 
kill/preempt/do-nothing)
- Used / available capacity in scheduler should be updated for future planning.

We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2925) Internal fields in LeafQueue access should be protected when accessed from FiCaSchedulerApp to calculate Headroom

2014-12-04 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2925:


 Summary: Internal fields in LeafQueue access should be protected 
when accessed from FiCaSchedulerApp to calculate Headroom
 Key: YARN-2925
 URL: https://issues.apache.org/jira/browse/YARN-2925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical


Upon YARN-2644, FiCaScheduler will calculation up-to-date headroom before 
sending back Allocation response to AM.

Headroom calculation is happened in LeafQueue side, uses fields like used 
resource, etc. But it is not protected by any lock of LeafQueue, so it might be 
corrupted is someone else is editing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-08 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2933:


 Summary: Capacity Scheduler preemption policy should only consider 
capacity without labels temporarily
 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently, we have capacity enforcement on each queue for each label in 
CapacityScheduler, but we don't have preemption policy to support that. 
YARN-2498 is targeting to support preemption respect node labels, but we have 
some gaps in code base, like queues/FiCaScheduler should be able to get 
usedResource/pendingResource, etc. by label. These items potentially need to 
refactor CS which we need spend some time carefully think about.

For now, what immediately we can do is allow calculate ideal_allocation and 
preempt containers only for resources on nodes without labels, to avoid 
regression like: A cluster has some nodes with labels and some not, assume 
queueA isn't satisfied for resource without label, but for now, preemption 
policy may preempt resource from nodes with labels for queueA, that is not 
correct.

Again, it is just a short-term enhancement, YARN-2498 will consider preemption 
respecting node-labels for Capacity Scheduler which is our final target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2943) Add a node-labels page in RM web UI

2014-12-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2943:


 Summary: Add a node-labels page in RM web UI
 Key: YARN-2943
 URL: https://issues.apache.org/jira/browse/YARN-2943
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Now we have node labels in the system, but there's no a very convenient to get 
information like how many active NM(s) assigned to a given label?, how much 
total resource for a give label?, For a given label, which queues can access 
it?, etc.

It will be better to add a node-labels page in RM web UI, users/admins can have 
a centralized view to see such information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3098) Create common QueueCapacities class in Capacity Scheduler to track capacities-by-labels of queues

2015-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3098:


 Summary: Create common QueueCapacities class in Capacity Scheduler 
to track capacities-by-labels of queues
 Key: YARN-3098
 URL: https://issues.apache.org/jira/browse/YARN-3098
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


Similar to YARN-3092, after YARN-796, now queues (ParentQueue and LeafQueue) 
need to track capacities-label (e.g. absolute-capacity, maximum-capacity, 
absolute-capacity, absolute-maximum-capacity, etc.). It's better to have a 
class to encapsulate these capacities to make both better 
maintainability/readability and fine-grained locking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track resources-by-label.

2015-01-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3099:


 Summary: Capacity Scheduler LeafQueue/ParentQueue should use 
ResourceUsage to track resources-by-label.
 Key: YARN-3099
 URL: https://issues.apache.org/jira/browse/YARN-3099
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3142) Improve locks in AppSchedulingInfo

2015-02-04 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3142:


 Summary: Improve locks in AppSchedulingInfo
 Key: YARN-3142
 URL: https://issues.apache.org/jira/browse/YARN-3142
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3139) Improve locks in AbstractYarnScheduler/CapacityScheduler/FairScheduler

2015-02-04 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3139:


 Summary: Improve locks in 
AbstractYarnScheduler/CapacityScheduler/FairScheduler
 Key: YARN-3139
 URL: https://issues.apache.org/jira/browse/YARN-3139
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Wangda Tan
Assignee: Li Lu


Enhance locks in AbstractYarnScheduler/CapacityScheduler/FairScheduler, as 
mentioned in YARN-3091, a possible solution is using read/write lock. Other 
fine-graind locks for specific purposes / bugs should be addressed in separated 
tickets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label

2015-02-02 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3124:


 Summary: Capacity Scheduler LeafQueue/ParentQueue should use 
QueueCapacities to track capacities-by-label
 Key: YARN-3124
 URL: https://issues.apache.org/jira/browse/YARN-3124
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits

2015-02-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-281.
-
  Resolution: Won't Fix
Release Note: 
I think this may not need since we already have tests in TestSchedulerUitls, it 
will verify minimum/maximum resource normalization/verification. And 
SchedulerUtil runs before scheduler can see such resource requests.

Resolved it as won't fix.

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3153) Capacity Scheduler max AM resource percentage is mis-used as ratio

2015-02-06 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3153:


 Summary: Capacity Scheduler max AM resource percentage is mis-used 
as ratio
 Key: YARN-3153
 URL: https://issues.apache.org/jira/browse/YARN-3153
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical


In existing Capacity Scheduler, it can limit max applications running within a 
queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but 
actually, it is used as ratio, in implementation, it assumes input will be 
\[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of 
queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated

2015-02-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3132:


 Summary: RMNodeLabelsManager should remove node from node-to-label 
mapping when node becomes deactivated
 Key: YARN-3132
 URL: https://issues.apache.org/jira/browse/YARN-3132
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


Using an example to explain:
1) Admin specify host1 has label=x
2) node=host1:123 registered
3) Get node-to-label mapping, return host1/host1:123
4) node=host1:123 unregistered
5) Get node-to-label mapping, still returns host1:123

Probably we should remove host1:123 when it becomes deactivated and no directly 
label assigned to it (directly assign means admin specify host1:123 has x 
instead of host1 has 123).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3213) Respect labels in Capacity Scheduler when computing user-limit

2015-02-18 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3213:


 Summary: Respect labels in Capacity Scheduler when computing 
user-limit
 Key: YARN-3213
 URL: https://issues.apache.org/jira/browse/YARN-3213
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


Now we can support node-labels in Capacity Scheduler, but user-limit computing 
doesn't respect node-labels enough, we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3233) Implement scheduler common configuration parser and create abstraction layer in CapacityScheduler to support plain/hierarchy configuration.

2015-02-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3233:


 Summary: Implement scheduler common configuration parser and 
create abstraction layer in CapacityScheduler to support plain/hierarchy 
configuration.
 Key: YARN-3233
 URL: https://issues.apache.org/jira/browse/YARN-3233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3214) Adding non-exclusive node labels

2015-02-18 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3214:


 Summary: Adding non-exclusive node labels 
 Key: YARN-3214
 URL: https://issues.apache.org/jira/browse/YARN-3214
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently node labels partition the cluster to some sub-clusters so resources 
cannot be shared between partitioned cluster. 

With the current implementation of node labels we cannot use the cluster 
optimally and the throughput of the cluster will suffer.

We are proposing adding non-exclusive node labels:

1. Labeled apps get the preference on Labeled nodes 
2. If there is no ask for labeled resources we can assign those nodes to non 
labeled apps
3. If there is any future ask for those resources , we will preempt the non 
labeled apps and give them back to labeled apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-02-18 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3216:


 Summary: Max-AM-Resource-Percentage should respect node labels
 Key: YARN-3216
 URL: https://issues.apache.org/jira/browse/YARN-3216
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-01-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3026:


 Summary: Move application-specific container allocation logic from 
LeafQueue to FiCaSchedulerApp
 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


Have a discussion with [~vinodkv] and [~jianhe]: 

In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
are located in LeafQueue.java in implementation. To make a cleaner scope of 
LeafQueue, we'd better move some of them to FiCaSchedulerApp.

Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
from ParentQueue (like 15% of cluster resource), and it distributes resources 
to children apps, and it should be agnostic to internal logic of children apps 
(like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager

2015-01-07 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3016:


 Summary: (Refactoring) Merge internalAdd/Remove/ReplaceLabels to 
one method in CommonNodeLabelsManager
 Key: YARN-3016
 URL: https://issues.apache.org/jira/browse/YARN-3016
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


Now we have separated but similar implementations for add/remove/replace labels 
on node in CommonNodeLabelsManager, we should merge it to a single one for 
easier modify them and better readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3014) Changing labels on a host should update all NM's labels on that host

2015-01-07 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3014:


 Summary: Changing labels on a host should update all NM's labels 
on that host
 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3234) Add changes in CapacityScheduler to use the abstracted configuration layer

2015-02-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3234:


 Summary: Add changes in CapacityScheduler to use the abstracted 
configuration layer
 Key: YARN-3234
 URL: https://issues.apache.org/jira/browse/YARN-3234
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3235) Support uniformed scheduler configuration in FairScheduler

2015-02-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3235:


 Summary: Support uniformed scheduler configuration in FairScheduler
 Key: YARN-3235
 URL: https://issues.apache.org/jira/browse/YARN-3235
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-03-16 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3354:


 Summary: Container should contains node-labels asked by original 
ResourceRequests
 Key: YARN-3354
 URL: https://issues.apache.org/jira/browse/YARN-3354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, capacityscheduler, nodemanager, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


We proposed non-exclusive node labels in YARN-3214, makes non-labeled resource 
requests can be allocated on labeled nodes which has idle resources.

To make preemption work, we need know an allocated container's original node 
label: when labeled resource requests comes back, we need kill non-labeled 
containers running on labeled nodes.

This requires add node-labels in Container, and also, NM need store this 
information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3356) Capacity Scheduler LeafQueue.User/FiCaSchedulerApp should use ResourceUsage to track used-resources-by-label.

2015-03-16 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3356:


 Summary: Capacity Scheduler LeafQueue.User/FiCaSchedulerApp should 
use ResourceUsage to track used-resources-by-label.
 Key: YARN-3356
 URL: https://issues.apache.org/jira/browse/YARN-3356
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Simliar to YARN-3099, Capacity Scheduler's LeafQueue.User/FiCaSchedulerApp 
should use ResourceRequest to track resource-usage/pending by label for better 
resource tracking and preemption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-03-17 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3361:


 Summary: CapacityScheduler side changes to support non-exclusive 
node labels
 Key: YARN-3361
 URL: https://issues.apache.org/jira/browse/YARN-3361
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


Reference to design doc attached in YARN-3214, this is CapacityScheduler side 
changes to support non-exclusive node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-03-17 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3362:


 Summary: Add node label usage in RM CapacityScheduler web UI
 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan


We don't have node label usage in RM CapacityScheduler web UI now, without 
this, user will be hard to understand what happened to nodes have labels assign 
to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3383) AdminService should use warn instead of info to log exception when operation fails

2015-03-20 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3383:


 Summary: AdminService should use warn instead of info to log 
exception when operation fails
 Key: YARN-3383
 URL: https://issues.apache.org/jira/browse/YARN-3383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Wangda Tan


Now it uses info:
{code}
  private YarnException logAndWrapException(IOException ioe, String user,
  String argName, String msg) throws YarnException {
LOG.info(Exception  + msg, ioe);
{code}
But it should use warn instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3298) User-limit should be enforced in CapacityScheduler

2015-03-05 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3298:


 Summary: User-limit should be enforced in CapacityScheduler
 Key: YARN-3298
 URL: https://issues.apache.org/jira/browse/YARN-3298
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, yarn
Reporter: Wangda Tan
Assignee: Wangda Tan


User-limit is not treat as a hard-limit for now, it will not consider 
required-resource (resource of being-allocated resource request). And also, 
when user's used resource equals to user-limit, it will still continue. This 
will generate jitter issues when we have YARN-2069 (preemption policy kills a 
container under an user, and scheduler allocate a container under the same user 
soon after).

The expected behavior should be as same as queue's capacity:
Only when user.usage + required = user-limit, queue will continue to allocate 
container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3340) Mark setters to be @Public for ApplicationId/ApplicationAttemptId/ContainerId.

2015-03-12 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3340:


 Summary: Mark setters to be @Public for 
ApplicationId/ApplicationAttemptId/ContainerId.
 Key: YARN-3340
 URL: https://issues.apache.org/jira/browse/YARN-3340
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker


Currently, setters of ApplicaitonId/ApplicationAttemptId/ContainerId are all 
private, that's not correct -- user's applications need to set such ids to do 
query status / submit application, etc.

We need mark such setters to be public avoiding downstream applications 
encounters compilation error when changes made on such setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3346) Deadlock in Capacity Scheduler

2015-03-12 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-3346.
--
   Resolution: Implemented
Fix Version/s: 2.6.1
   2.7.0

This issue is already resolved: YARN-3251 for 2.6.1 fix, and YARN-3265 for 
2.7.0 fix. 

 Deadlock in Capacity Scheduler
 --

 Key: YARN-3346
 URL: https://issues.apache.org/jira/browse/YARN-3346
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Suma Shivaprasad
 Fix For: 2.7.0, 2.6.1

 Attachments: rm.deadlock_jstack


 {noformat}
 Found one Java-level deadlock:
 =
 2144051991@qtp-383501499-6:
   waiting to lock monitor 0x7fa700eec8e8 (object 0x0004589fec18, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp),
   which is held by ResourceManager Event Processor
 ResourceManager Event Processor:
   waiting to lock monitor 0x7fa700aadf88 (object 0x000441c05ec8, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
   which is held by IPC Server handler 0 on 54311
 IPC Server handler 0 on 54311:
   waiting to lock monitor 0x7fa700e20798 (object 0x000441d867f8, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue),
   which is held by ResourceManager Event Processor
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API

2015-03-12 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3345:


 Summary: Add non-exclusive node label RMAdmin CLI/API
 Key: YARN-3345
 URL: https://issues.apache.org/jira/browse/YARN-3345
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


As described in YARN-3214 (see design doc attached to that JIRA), we need add 
non-exclusive node label RMAdmin API and CLI implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3279) AvailableResource of QueueMetrics should consider queue's current-max-limit

2015-02-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3279:


 Summary: AvailableResource of QueueMetrics should consider queue's 
current-max-limit
 Key: YARN-3279
 URL: https://issues.apache.org/jira/browse/YARN-3279
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan


Now, available resource of queue doesn't consider queue's current-max-limit, 
but available resource of user already considered that, we should make them 
consistent.

And in addition, we can make code better organized, now computation of 
AvailableResource of QueueMetrics/UserMetrics are placed in two places, we 
should merge them to a single place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3277) Queue's current-max-limit should be updated before allocate reserved container.

2015-02-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3277:


 Summary: Queue's current-max-limit should be updated before 
allocate reserved container.
 Key: YARN-3277
 URL: https://issues.apache.org/jira/browse/YARN-3277
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


This is introduced by changes of YARN-3265. With YARN-2008, when RM allocates 
reserved container, it goes to LeafQueue directly, then goes up to root to get 
LeafQueue's current-max-limit correct. Now we will not go up, so LeafQueue 
cannot get maxQueueLimit updated before allocating reserved container.

One possible solution is, we can still start from root when allocate reserved 
container, but goes to LeafQueue which reserved container belongs to from top 
to bottom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-02-23 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3243:


 Summary: CapacityScheduler should pass headroom from parent to 
children to make sure ParentQueue obey its capacity limits.
 Key: YARN-3243
 URL: https://issues.apache.org/jira/browse/YARN-3243
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan


Now CapacityScheduler has some issues to make sure ParentQueue always obeys its 
capacity limits, for example:
1) When allocating container of a parent queue, it will only check 
parentQueue.usage  parentQueue.max. If leaf queue allocated a container.size  
(parentQueue.max - parentQueue.usage), parent queue can excess its max resource 
limit, as following example:
{code}
A  (usage=54, max=55)
   / \
  A1 A2 (usage=1, max=55)
(usage=53, max=53)
{code}
Queue-A2 is able to allocate container since its usage  max, but if we do 
that, A's usage can excess A.max.

2) When doing continous reservation check, parent queue will only tell children 
you need unreserve *some* resource, so that I will less than my maximum 
resource, but it will not tell how many resource need to be unreserved. This 
may lead to parent queue excesses configured maximum capacity as well.

With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
*here is my proposal*:
- ParentQueue will set its children's ResourceUsage.headroom, which means, 
*maximum resource its children can allocate*.
- ParentQueue will set its children's headroom to be (saying parent's name is 
qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' 
capacity will be enforced as well (qA.headroom is set by qA's parent).
- {{needToUnReserve}} is not necessary, instead, children can get how much 
resource need to be unreserved to keep its parent's resource limit.
- More over, with this, YARN-3026 will make a clear boundary between LeafQueue 
and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-02-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-3003.
--
Resolution: Duplicate

[~varun_saxena],
Thanks for reminding, I just reopen and then resolved as duplicated, since 
patch of this JIRA is divided to other two JIRAs, there's no code actually 
committed for this one.

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena
 Attachments: YARN-3003.001.patch, YARN-3003.002.patch


 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-02-25 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3265:


 Summary: CapacityScheduler deadlock when computing absolute max 
avail capacity (fix for trunk/branch-2)
 Key: YARN-3265
 URL: https://issues.apache.org/jira/browse/YARN-3265
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


This patch is trying to solve the same problem described in YARN-3251, but this 
is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3251) Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-3251.
--
   Resolution: Fixed
Fix Version/s: 2.6.1
 Hadoop Flags: Reviewed

Just compiled and ran all tests in CapacityScheduler, committed to branch-2.6. 

Thanks [~cwelch] and also reviews from [~jlowe], [~sunilg] and [~vinodkv].

 Fix CapacityScheduler deadlock when computing absolute max avail capacity 
 (short term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Fix For: 2.6.1

 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   >