[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126621#comment-14126621
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Thanks for update the patch. Sorry for late, I've jus take a look at your 
patch, I think the new added test looks good to me, only a couple of minor 
comments,
1.
{code}
+// verify capacity taken from queueB, not queueE despite queueE being far
+// over its absolute guaranteed capacity
{code}
queueE isn't preempted because its parent queue still under satisfied, as you 
know, it's internal mechanism of preemption policy. I think it's better to add 
it to the comment, can save some time when people looking at the test. 

2.
{code}
+ApplicationAttemptId expectedAttemptOnQueueB = 
+ApplicationAttemptId.newInstance(
+appA.getApplicationId(), appA.getAttemptId());
+assertTrue("appA should be running on queueB",
+mCS.getAppsInQueue("queueB").contains(expectedAttemptOnQueueB));
{code}
It's better to remove such assertion, it's unrelated to preemption policy. I 
guess you added it here because you want to check if mockQueue/mockApp is 
correct. I suggest you can add a separated test to verify mock nest queue/app.

Also some similar checks on queueC/E 

3. failed test TestCapacitySchedulerQueueACLs should not related to this 
change, but it's better to re-kick jenkins to make sure it.

Thanks,
Wangda


> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126591#comment-14126591
 ] 

Hadoop QA commented on YARN-1458:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12667336/YARN-1458.alternative2.patch
  against trunk revision 7498dd7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4854//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4854//console

This message is automatically generated.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.alternative2.patch, YARN-1458.patch, 
> yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.

[jira] [Commented] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126580#comment-14126580
 ] 

Rohith commented on YARN-2523:
--

Decommissioned Node metrics are set by NodeListManager. If decommission nodes 
rejoin, then RMNodeImpl#updateMetricsForRejoinedNode() again decrements metrics 
by 1 which cause negative value.

There should have check in RMNodeImpl#updateMetricsForRejoinedNode() for 
decommission state.
{code}
if (!ecludedHosts.contains(hostName)
  && !ecludedHosts.contains(NetUtils.normalizeHostName(hostName))) {
metrics.decrDecommisionedNMs();
}
{code}

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-08 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2523:

Affects Version/s: (was: 2.4.1)
   3.0.0

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 3.0.0
>Reporter: Nishan Shetty
>Assignee: Rohith
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-08 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2523:

Priority: Major  (was: Minor)

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Assignee: Rohith
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-08 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2523:


Assignee: Rohith

> ResourceManager UI showing negative value for "Decommissioned Nodes" field
> --
>
> Key: YARN-2523
> URL: https://issues.apache.org/jira/browse/YARN-2523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Minor
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2524) ResourceManager UI shows negative value for "Decommissioned Nodes" field

2014-09-08 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty resolved YARN-2524.
-
Resolution: Invalid

2 issues got created by mistake.

> ResourceManager UI shows negative value for "Decommissioned Nodes" field
> 
>
> Key: YARN-2524
> URL: https://issues.apache.org/jira/browse/YARN-2524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Nishan Shetty
>
> 1. Decommission one NodeManager by configuring ip in excludehost file
> 2. Remove ip from excludehost file
> 3. Execute -refreshNodes command and restart Decommissioned NodeManager
> Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2524) ResourceManager UI shows negative value for "Decommissioned Nodes" field

2014-09-08 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2524:
---

 Summary: ResourceManager UI shows negative value for 
"Decommissioned Nodes" field
 Key: YARN-2524
 URL: https://issues.apache.org/jira/browse/YARN-2524
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Nishan Shetty


1. Decommission one NodeManager by configuring ip in excludehost file
2. Remove ip from excludehost file
3. Execute -refreshNodes command and restart Decommissioned NodeManager

Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field

2014-09-08 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2523:
---

 Summary: ResourceManager UI showing negative value for 
"Decommissioned Nodes" field
 Key: YARN-2523
 URL: https://issues.apache.org/jira/browse/YARN-2523
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor


1. Decommission one NodeManager by configuring ip in excludehost file
2. Remove ip from excludehost file
3. Execute -refreshNodes command and restart Decommissioned NodeManager

Observe that in RM UI negative value for "Decommissioned Nodes" field is shown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2463) Add total cluster capacity to AllocateResponse

2014-09-08 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev resolved YARN-2463.
-
Resolution: Invalid

> Add total cluster capacity to AllocateResponse
> --
>
> Key: YARN-2463
> URL: https://issues.apache.org/jira/browse/YARN-2463
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> YARN-2448 exposes the ResourceCalculator being used by the scheduler so that 
> AMs can make better decisions when scheduling tasks. The 
> DominantResourceCalculator needs the total cluster capacity to function 
> correctly. We should add this information to the AllocateResponse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2463) Add total cluster capacity to AllocateResponse

2014-09-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126558#comment-14126558
 ] 

Varun Vasudev commented on YARN-2463:
-

Not required anymore since we don't expose the resource calculator.

> Add total cluster capacity to AllocateResponse
> --
>
> Key: YARN-2463
> URL: https://issues.apache.org/jira/browse/YARN-2463
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>
> YARN-2448 exposes the ResourceCalculator being used by the scheduler so that 
> AMs can make better decisions when scheduling tasks. The 
> DominantResourceCalculator needs the total cluster capacity to function 
> correctly. We should add this information to the AllocateResponse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-09-08 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126557#comment-14126557
 ] 

Tsuyoshi OZAWA commented on YARN-2517:
--

As Zhijie mentioned, we should have the callback if we need to check errors. 
IMHO, if we have a thread for the callback "onError", we should also have 
"onEntitiesPut" since the complexity doesn't increase so much and it's useful 
to distinguish connection-level exceptions from entity-level errors.

> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126533#comment-14126533
 ] 

zhihai xu commented on YARN-1458:
-

I uploaded a new patch "YARN-1458.alternative2.patch" which add a new test 
case:all queues have none zero minShare:
queueA and queueB each have eight 0.5 and minShare 1024,
the cluster have resource 8192. so each queue should have 4096 fair share.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.alternative2.patch, YARN-1458.patch, 
> yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-1458:

Attachment: YARN-1458.alternative2.patch

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.alternative2.patch, YARN-1458.patch, 
> yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126498#comment-14126498
 ] 

zhihai xu commented on YARN-1458:
-

Hi [~kasha], I just found an example to prove the first approach doesn't work 
when minShare is not zero(all queues have none zero minShare).
The following is the example:
We have 4 queues A,B,C and D: each have 0.25 weight, each have minShare 1024,
The cluster have resource 6144(6*1024)
using the first approach to compare with previous result, we will exit early in 
the loop with each Queue's fair share is 1024.
The reason is that computeShare will return minShare value 1024 when rMax 
<=2048 in the following code:
{code}
private static int computeShare(Schedulable sched, double w2rRatio,
  ResourceType type) {
double share = sched.getWeights().getWeight(type) * w2rRatio;
share = Math.max(share, getResourceValue(sched.getMinShare(), type));
share = Math.min(share, getResourceValue(sched.getMaxShare(), type));
return (int) share;
  }
{code}
So for the first 12 iterations, the currentRU is not changed which is sum of 
all queues' minShare(4096).
If we use second approach, we will get the correct result: each Queue's fair 
share is 1536.
In this case, the second approach is definitely better than the first approach,
the first approach can't handle the case:all queues have none zero minShare.

I will create a new test case in the second approach patch.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourceman

[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126493#comment-14126493
 ] 

Hadoop QA commented on YARN-2494:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667321/YARN-2494.patch
  against trunk revision 7498dd7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4853//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4853//console

This message is automatically generated.

> [YARN-796] Node label manager API and storage implementations
> -
>
> Key: YARN-2494
> URL: https://issues.apache.org/jira/browse/YARN-2494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2494.patch, YARN-2494.patch
>
>
> This JIRA includes APIs and storage implementations of node label manager,
> NodeLabelManager is an abstract class used to manage labels of nodes in the 
> cluster, it has APIs to query/modify
> - Nodes according to given label
> - Labels according to given hostname
> - Add/remove labels
> - Set labels of nodes in the cluster
> - Persist/recover changes of labels/labels-on-nodes to/from storage
> And it has two implementations to store modifications
> - Memory based storage: It will not persist changes, so all labels will be 
> lost when RM restart
> - FileSystem based storage: It will persist/recover to/from FileSystem (like 
> HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126481#comment-14126481
 ] 

Hadoop QA commented on YARN-2033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667301/YARN-2033.9.patch
  against trunk revision 7498dd7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4852//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4852//console

This message is automatically generated.

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
> YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.8.patch, 
> YARN-2033.9.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, 
> YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2494:
-
Attachment: YARN-2494.patch

> [YARN-796] Node label manager API and storage implementations
> -
>
> Key: YARN-2494
> URL: https://issues.apache.org/jira/browse/YARN-2494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2494.patch, YARN-2494.patch
>
>
> This JIRA includes APIs and storage implementations of node label manager,
> NodeLabelManager is an abstract class used to manage labels of nodes in the 
> cluster, it has APIs to query/modify
> - Nodes according to given label
> - Labels according to given hostname
> - Add/remove labels
> - Set labels of nodes in the cluster
> - Persist/recover changes of labels/labels-on-nodes to/from storage
> And it has two implementations to store modifications
> - Memory based storage: It will not persist changes, so all labels will be 
> lost when RM restart
> - FileSystem based storage: It will persist/recover to/from FileSystem (like 
> HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2522) AHSClient may be not necessary

2014-09-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2522:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-1530)

> AHSClient may be not necessary
> --
>
> Key: YARN-2522
> URL: https://issues.apache.org/jira/browse/YARN-2522
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Per discussion in 
> [YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
>  it may be not necessary to have a separate AHSClient. The methods can be 
> incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless 
> then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2522) AHSClient may be not necessary

2014-09-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2522:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

> AHSClient may be not necessary
> --
>
> Key: YARN-2522
> URL: https://issues.apache.org/jira/browse/YARN-2522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Per discussion in 
> [YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
>  it may be not necessary to have a separate AHSClient. The methods can be 
> incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless 
> then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2522) AHSClient may be not necessary

2014-09-08 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2522:
-

 Summary: AHSClient may be not necessary
 Key: YARN-2522
 URL: https://issues.apache.org/jira/browse/YARN-2522
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Per discussion in 
[YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
 it may be not necessary to have a separate AHSClient. The methods can be 
incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless 
then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1250) Generic history service should support application-acls

2014-09-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1250:
--
Attachment: YARN-1250.2.patch

Update the patch according the latest patch of YARN-2033.

> Generic history service should support application-acls
> ---
>
> Key: YARN-1250
> URL: https://issues.apache.org/jira/browse/YARN-1250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, 
> YARN-1250.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1712) Admission Control: plan follower

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126397#comment-14126397
 ] 

Jian He commented on YARN-1712:
---

Thanks Subra and Carlo working on the patch. Some comments and questions on the 
patch:

- I think the default queue can be initialized upfront when PlanQueue is 
initialized in CapacityScheduler
{code}
  // Add default queue if it doesnt exist
  if (scheduler.getQueue(defPlanQName) == null) {
{code}
- Consolidate the comments into 2 lines
{code}
  // identify the reservations that have expired and new reservations that
  // have to
  // be activated
{code}
- Exceptions like the following are ignored. Is this intentional ?
{code}
 } catch (YarnException e) {
  LOG.warn(
  "Exception while trying to release default queue capacity for 
plan: {}",
  planQueueName, e);
}
{code}
- may be create a common method to calculate lhsRes and rhsRes
{code}
  CSQueue lhsQueue = scheduler.getQueue(lhs.getReservationId().toString());
  if (lhsQueue != null) {
lhsRes =
Resources.subtract(
lhs.getResourcesAtTime(now),
Resources.multiply(clusterResource,
lhsQueue.getAbsoluteCapacity()));
  } else {
lhsRes = lhs.getResourcesAtTime(now);
  }
{code}
- allocatedCapacity, may rename to reservedResources
{code}
  Resource allocatedCapacity = Resource.newInstance(0, 0);
{code}
- Instead of doing the following:  
{code}
  for (CSQueue resQueue : resQueues) {
previousReservations.add(resQueue.getQueueName());
  }
  Set expired =
  Sets.difference(previousReservations, curReservationNames);
  Set toAdd =
  Sets.difference(curReservationNames, previousReservations);
{code}
we can do something like this to save some time cost. 
{code}
for queue in previousReservations:
if (queue not in curReservationNames)
expired.add(queue)
else:
   curReservationNames.remove(queue) // curReservationNames contains the 
ToAdd queues in the end
{code}
- Not sure if this method is only used by PlanFollower. If it is, we can change 
the return value to be a set of reservation names so that we don’t need to loop 
later to get all the reservation names..
{code}
  Set currentReservations =
  plan.getReservationsAtTime(now);
{code}
- rename defPlanQName to defReservationQueue
{code}
  String defPlanQName = planQueueName + PlanQueue.DEFAULT_QUEUE_SUFFIX;
{code}
- The apps are already in current planQueue, IIUC, this is the 
defaultReservationQueue ? If that, I think we may change the queueName 
parameter to the proper defaultReservationQueue name. Also, 
AbstractYarnScheduler#moveAllApps is actually expecting the queue to be 
leafQueue(ReservationQueue), not planQueue(parentQueue).
{code}
// Move all the apps in these queues to the PlanQueue
moveAppsInQueues(toMove, planQueueName);
{code}
- I’m thinking if we can make PlanFollower synchronously move apps to the 
defaultQueue, for the following reasons:
{code}
1. IIUC, the logic for moveAll and killAll is that: the first Time 
synchronizePlan is called, it will try to move all expired apps; next Time 
synchronizePlan is called, it will kill all the previously not-yet-moved apps. 
If the synchronizePlan interval is very small,  it’s likely to kill most apps 
that are being move.
2. Exceptions from CapacityScheduler#moveApplication are currently just 
ignored, if doing asynchronously 
3. PlanFollower is now anyways locking the whole scheduler in synchronizePlan 
method (though I’m still thinking if we need to lock the whole scheduler as 
this is kind of costly.)
4. In AbstractYarnScheduler#moveAllApps, We can do the moveApp synchronously, 
and still send events to RMApp to update its bookkeeping if needed. (But I 
don’t think we need to send the event now.)
5. PlanFollower move logic should be much simpler if doing synchronously 
{code}

> Admission Control: plan follower
> 
>
> Key: YARN-1712
> URL: https://issues.apache.org/jira/browse/YARN-1712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: reservations, scheduler
> Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.patch
>
>
> This JIRA tracks a thread that continuously propagates the current state of 
> an inventory subsystem to the scheduler. As the inventory subsystem store the 
> "plan" of how the resources should be subdivided, the work we propose in this 
> JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
> add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#633

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--
Attachment: YARN-2033.9.patch

Fix one typo in the class name

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
> YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.8.patch, 
> YARN-2033.9.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, 
> YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126340#comment-14126340
 ] 

Hadoop QA commented on YARN-2033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667269/YARN-2033.8.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4851//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4851//console

This message is automatically generated.

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
> YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.8.patch, 
> YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
> YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-09-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126284#comment-14126284
 ] 

Xuan Gong commented on YARN-2308:
-

bq. We can check if the queue exists on recovery. If not, directly return 
FAILED state and no need to add the attempts anymore. Thoughts ?

If we are doing this, the RMAppAttempt will show the *in-correct* state in 
ApplicationHistoryStore

> NPE happened when RM restart after CapacityScheduler queue configuration 
> changed 
> -
>
> Key: YARN-2308
> URL: https://issues.apache.org/jira/browse/YARN-2308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: chang li
>Priority: Critical
> Attachments: jira2308.patch, jira2308.patch, jira2308.patch
>
>
> I encountered a NPE when RM restart
> {code}
> 2014-07-16 07:22:46,957 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> And RM will be failed to restart.
> This is caused by queue configuration changed, I removed some queues and 
> added new queues. So when RM restarts, it tries to recover history 
> applications, and when any of queues of these applications removed, NPE will 
> be raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126272#comment-14126272
 ] 

Zhijie Shen commented on YARN-2320:
---

According to Vinod's comments: 
https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073

We may think of removing the old store stack directly.

> Removing old application history store after we store the history data to 
> timeline store
> 
>
> Key: YARN-2320
> URL: https://issues.apache.org/jira/browse/YARN-2320
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> After YARN-2033, we should deprecate application history store set. There's 
> no need to maintain two sets of store interfaces. In addition, we should 
> conclude the outstanding jira's under YARN-321 about the application history 
> store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2320:
--
Summary: Removing old application history store after we store the history 
data to timeline store  (was: Deprecate existing application history store 
after we store the history data to timeline store)

> Removing old application history store after we store the history data to 
> timeline store
> 
>
> Key: YARN-2320
> URL: https://issues.apache.org/jira/browse/YARN-2320
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> After YARN-2033, we should deprecate application history store set. There's 
> no need to maintain two sets of store interfaces. In addition, we should 
> conclude the outstanding jira's under YARN-321 about the application history 
> store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126259#comment-14126259
 ] 

Hadoop QA commented on YARN-2459:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667254/YARN-2459.6.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4850//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4850//console

This message is automatically generated.

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126249#comment-14126249
 ] 

zhihai xu commented on YARN-1458:
-

Yes, it works, it can fix the zero weight with non-zero minShare if we compare 
with previous result.
But the alternative approach will be a little faster compare to the first 
approach(less computation and less schedulables in the calculation after 
filtering fixed shared schedulables). Either approach is ok for me.
I will submit a patch on the first approach to compare with previous result.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126240#comment-14126240
 ] 

Hadoop QA commented on YARN-1458:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667252/yarn-1458-5.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4849//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4849//console

This message is automatically generated.

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQu

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-08 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--
Attachment: YARN-2033.8.patch

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
> YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.8.patch, 
> YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
> YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126236#comment-14126236
 ] 

Zhijie Shen commented on YARN-2033:
---

[~vinodkv], thanks for your comments. I've updated the patch accordingly.

bq. RMApplicationHistoryWriter is not really needed anymore. We did document it 
to be unstable/alpha too. We can remove it directly instead of deprecating it. 
It's a burden to support two interface hierarchies. I'm okay doing it 
separately though.

Seems to make sense. Previously I created a ticket for deprecating the old 
history store stack. Let me update that jira.

bq. YarnClientImpl: Calls using AHSClient shouldn't rely on timeline-publisher 
yet, we should continue to use APPLICATION_HISTORY_ENABLED for that till we get 
rid of AHSClient altogether. We should file a ticket for this too.

In the newer patch, I revert the change in YarnClientImpl, making it use 
APPLICATION_HISTORY_ENABLED. And ApplicationHistoryServer checks 
APPLICATION_HISTORY_STORE for backward compatibility. This can be simplified 
once the old history store stack is removed. Also I simplify the configuration 
check in SystemMetricsPublisher. I'll create a jira for getting rid of 
AHSClient.

bq. You removed the unstable annotations from ApplicationContext APIs. We 
should retain them, this stuff isn't stable yet.

ApplicationContext is for internal usage only, not user-faced interface. So I 
think it should be removed not to confuse people.

bq. Rename YarnMetricsPublisher -> {Platform|System}
MetricsPublisher to avoid confusing it with host/daemon metrics that exist 
outside today?

Renamed all yarnmetrics -> systemmetrics.

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
> YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
> YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
> YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126210#comment-14126210
 ] 

Karthik Kambatla commented on YARN-1458:


bq. the alternative approach can fix zero weight with non-zero minShare but the 
first approach can't
I see. Good point. I was wondering if there were cases we might want to check 
for {{if (currentRU - previousRU < epsilon || currentRU > totalResource)}}. The 
zero weight and non-zero minshare should be handled by such a check, no? 

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126204#comment-14126204
 ] 

zhihai xu commented on YARN-1458:
-

Hi [~kasha], thanks for the review, The first approach has simplicity and 
readability advantage but it can't cover all the corner cases.
the alternative approach can fix zero weight with non-zero minShare but the 
first approach can't. 
Both approach can fix zero weight with zero minShare. We may have limitation to 
keep track of the resource-usage from the previous iteration and see if we are 
making progress, For example for a very small weight, there may be 0 value 
return from resourceUsedWithWeightToResourceRatio  after multiple iteration.
thanks
zhihai

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126202#comment-14126202
 ] 

Jian He commented on YARN-415:
--

Looks good to me. Just one  more question, I kind of lose context why we need 
this check; seems we don't need, because the returned 
ApplicationResourceUsageReport for non-active attempt is anyways null.
{code}
// Only add in the running containers if this is the active attempt.
RMAppAttempt currentAttempt = rmContext.getRMApps()
   .get(attemptId.getApplicationId()).getCurrentAppAttempt();
if (currentAttempt != null &&
currentAttempt.getAppAttemptId().equals(attemptId)) {
{code}

> Capture aggregate memory allocation at the app-level for chargeback
> ---
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
> YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
> YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
> YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
> YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
> YARN-415.201409040036.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126193#comment-14126193
 ] 

Hadoop QA commented on YARN-1709:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667251/YARN-1709.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4848//console

This message is automatically generated.

> Admission Control: Reservation subsystem
> 
>
> Key: YARN-1709
> URL: https://issues.apache.org/jira/browse/YARN-1709
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subramaniam Krishnan
> Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
> YARN-1709.patch
>
>
> This JIRA is about the key data structure used to track resources over time 
> to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of 
> how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126172#comment-14126172
 ] 

Karthik Kambatla commented on YARN-1458:


By the way, I like the first approach mainly because of its simplicity and 
readability. 

In the while loop that was running forever, we could optionally keep track of 
the resource-usage from the previous iteration and see if we are making 
progress. 

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register

2014-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126162#comment-14126162
 ] 

Vinod Kumar Vavilapalli commented on YARN-2448:
---

+1, this looks good. Checking this in..

> RM should expose the resource types considered during scheduling when AMs 
> register
> --
>
> Key: YARN-2448
> URL: https://issues.apache.org/jira/browse/YARN-2448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, 
> apache-yarn-2448.2.patch
>
>
> The RM should expose the name of the ResourceCalculator being used when AMs 
> register, as part of the RegisterApplicationMasterResponse.
> This will allow applications to make better decisions when scheduling. 
> MapReduce for example, only looks at memory when deciding it's scheduling, 
> even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2459:
--
Attachment: YARN-2459.6.patch

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126158#comment-14126158
 ] 

Karthik Kambatla commented on YARN-1458:


Thanks Zhihai for working on this. I like the first approach: uploading a patch 
with minor nit fixes. Let me know if this looks good to you. 

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-09-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1458:
---
Attachment: yarn-1458-5.patch

> In Fair Scheduler, size based weight can cause update thread to hold lock 
> indefinitely
> --
>
> Key: YARN-1458
> URL: https://issues.apache.org/jira/browse/YARN-1458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: Centos 2.6.18-238.19.1.el5 X86_64
> hadoop2.2.0
>Reporter: qingwu.fu
>Assignee: zhihai xu
>  Labels: patch
> Fix For: 2.2.1
>
> Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
> YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.alternative0.patch, 
> YARN-1458.alternative1.patch, YARN-1458.patch, yarn-1458-5.patch
>
>   Original Estimate: 408h
>  Remaining Estimate: 408h
>
> The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
> clients submit lots jobs, it is not easy to reapear. We run the test cluster 
> for days to reapear it. The output of  jstack command on resourcemanager pid:
> {code}
>  "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
> waiting for monitor entry [0x43aa9000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
> - waiting to lock <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:744)
> ……
> "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
> runnable [0x433a2000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
> - locked <0x00070026b6e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
> at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126156#comment-14126156
 ] 

Jian He commented on YARN-2308:
---

Looked at this again, I think the solution mentioned by [~sunilg] is reasonable:
bq. During RMAppRecoveredTransition in RMAppImpl, may be we can check recovered 
app queue (can get this from submission context) is still a valid queue? If 
this queue not present, recovery for that app can be made failed, and may be 
need to do some more RMApp clean up. Sounds doable?
We can check if the queue exists on recovery. If not, directly return FAILED 
state and no need to add the attempts anymore.  Thoughts ?


> NPE happened when RM restart after CapacityScheduler queue configuration 
> changed 
> -
>
> Key: YARN-2308
> URL: https://issues.apache.org/jira/browse/YARN-2308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: chang li
>Priority: Critical
> Attachments: jira2308.patch, jira2308.patch, jira2308.patch
>
>
> I encountered a NPE when RM restart
> {code}
> 2014-07-16 07:22:46,957 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> And RM will be failed to restart.
> This is caused by queue configuration changed, I removed some queues and 
> added new queues. So when RM restarts, it tries to recover history 
> applications, and when any of queues of these applications removed, NPE will 
> be raised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-09-08 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1709:
---
Attachment: YARN-1709.patch

Thanks [~chris.douglas] for your exhaustive review. I am uploading a patch that 
has the following fixes:
  * Cloned _ZERO_RESOURCE_, _minimumAllocation_ and _maximumAllocation_ to 
prevent leaking of mutable data
   * Removed MessageFormat. Have to concat strings in few cases where they are 
both logged and included as part of exception message
  * Fixed the code readability and lock scope in _addReservation()_
  * Added assertions for _isWriteLockedByCurrentThread()_ in private methods 
that assume locks
  * Removed redundant _this_ in get methods
  * toString uses StringBuilder instead of StringBuffer now
  * Fixed Javadoc - content (_getEarliestStartTime()_) and whitespaces
  * Made _ReservationInterval_ immutable, good catch

The ReservationSystem uses UTCClock (added as part of YARN-1708) to enforce UTC 
times.  

> Admission Control: Reservation subsystem
> 
>
> Key: YARN-1709
> URL: https://issues.apache.org/jira/browse/YARN-1709
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subramaniam Krishnan
> Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
> YARN-1709.patch
>
>
> This JIRA is about the key data structure used to track resources over time 
> to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of 
> how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores

2014-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126140#comment-14126140
 ] 

Vinod Kumar Vavilapalli commented on YARN-2440:
---

Just caught up with the discussion. I can get behind an absolute limit too. 
Specifically in the context of heterogeneous clusters where uniform % 
configurations can go really bad where the only resort will then be to do 
per-node configuration - not ideal. Would that be a valid use-case for putting 
in the absolute limit? [~jlowe]? Even if it were, I am okay punting that off to 
a separate JIRA.

Comments on the patch:
 - containers-limit-cpu-percentage -> 
{{yarn.nodemanager.resource.percentage-cpu-limit}} to be consistent? Similarly 
NM_CONTAINERS_CPU_PERC? I don't like the tag  'resource', it should have been 
'resources' but it is what it is.
 - You still have refs to YarnConfiguration.NM_CONTAINERS_CPU_ABSOLUTE in the 
patch. Similarly the javadoc in NodeManagerHardwareUtils needs to be updated if 
we are not adding the absolute cpu config. It should no longer refer to "number 
of cores that should be used for YARN containers"
 - TestCgroupsLCEResourcesHandler: You can use mockito if you only want to 
override num-processors in TestResourceCalculatorPlugin. Similarly in 
TestNodeManagerHardwareUtils.
 - The tests may fail on a machine with > 4 cores? :)

> Cgroups should allow YARN containers to be limited to allocated cores
> -
>
> Key: YARN-2440
> URL: https://issues.apache.org/jira/browse/YARN-2440
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
> apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
> screenshot-current-implementation.jpg
>
>
> The current cgroups implementation does not limit YARN containers to the 
> cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126135#comment-14126135
 ] 

Hadoop QA commented on YARN-2459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667234/YARN-2459.5.patch
  against trunk revision d989ac0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4847//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4847//console

This message is automatically generated.

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch, YARN-2459.5.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2256) Too many nodemanager and resourcemanager audit logs are generated

2014-09-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126119#comment-14126119
 ] 

Varun Saxena commented on YARN-2256:


bq. Someone correct me if I'm wrong, but I'm fairly certain that the intent of 
this information is to be the equivalent of the HDFS audit log. In other words, 
setting these to debug completely defeats the purpose. Instead, I suspect the 
real culprit is that the log4j settings are wrong for the node manager process.

[~aw], the issue raised was basically for both NM and RM. I have updated the 
description to reflect that. The issue here is that some of the container 
related operations' audit logs in both NM and RM are too frequent and too many. 
This may impact performance as well.

Now, there are 2 solutions possible, either remove these logs or change the log 
level, so that they do not occur in live environment and can be opened only 
when required. 
As I wasnt sure if these audit logs have to be removed or not, I changed the 
log level for some of these logs in RM and all of them in NM. To ensure this, I 
supported printing of audit logs at different levels, as is done in HBase (as 
per my info). This is handled as part of YARN-2287

Now for NM, you are correct, Log level can be changed in log4j properties to 
suppress these logs if required. But for RM, as not all logs have to be 
suppressed, this cant be done. So to be consistent, I added log levels for both 
NM and RM. 

If its agreeable to remove these audit logs, that can be a possible solution as 
well. Pls suggest. 

> Too many nodemanager and resourcemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126073#comment-14126073
 ] 

Vinod Kumar Vavilapalli commented on YARN-2033:
---

Mostly looks fine, this is a rapidly changing part of the code-base! I get a 
feeling we need some umbrella cleanup effort to make consistent usage w.r.t 
history-service/timeline-service. Anyways, some comments
 - RMApplicationHistoryWriter is not really needed anymore. We did document it 
to be unstable/alpha too.  We can remove it directly instead of deprecating it. 
It's a burden to support two interface hierarchies. I'm okay doing it 
separately though.
 - YarnClientImpl: Calls using AHSClient shouldn't rely on timeline-publisher 
yet, we should continue to use APPLICATION_HISTORY_ENABLED for that till we get 
rid of AHSClient altogether. We should file a ticket for this too.
 - You removed the unstable annotations from ApplicationContext APIs. We should 
retain them, this stuff isn't stable yet.
 - Rename YarnMetricsPublisher -> {Platform|System}MetricsPublisher to avoid 
confusing it with host/daemon metrics that exist outside today?
 

> Investigate merging generic-history into the Timeline Store
> ---
>
> Key: YARN-2033
> URL: https://issues.apache.org/jira/browse/YARN-2033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
> YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
> YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
> YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
> YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch
>
>
> Having two different stores isn't amicable to generic insights on what's 
> happening with applications. This is to investigate porting generic-history 
> into the Timeline Store.
> One goal is to try and retain most of the client side interfaces as close to 
> what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2459:
--
Attachment: YARN-2459.5.patch

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch, YARN-2459.5.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126037#comment-14126037
 ] 

Jian He commented on YARN-2459:
---

New patch added some comments in the test case

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch, YARN-2459.5.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126036#comment-14126036
 ] 

Hadoop QA commented on YARN-2459:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667215/YARN-2459.4.patch
  against trunk revision df8c84c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4846//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4846//console

This message is automatically generated.

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch, YARN-2459.5.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2256) Too many nodemanager and resourcemanager audit logs are generated

2014-09-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2256:
---
Summary: Too many nodemanager and resourcemanager audit logs are generated  
(was: Too many nodemanager audit logs are generated)

> Too many nodemanager and resourcemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2014-09-08 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126005#comment-14126005
 ] 

Sandy Ryza commented on YARN-2154:
--

I'd like to add another constraint that I've been thinking about into the mix.  
We don't necessarily need to implement it in this JIRA, but I think it's worth 
considering how it would affect the approach.

A queue should only be able to preempt a container from another queue if every 
queue between the starved queue and their least common ancestor is starved.  
This essentially means that we consider preemption and fairness hierarchically. 
 If the "marketing" and "engineering" queues are square in terms of resources, 
starved teams in engineering shouldn't be able to take resources from queues in 
marketing - they should only be able to preempt from queues within engineering.



> FairScheduler: Improve preemption to preempt only those containers that would 
> satisfy the incoming request
> --
>
> Key: YARN-2154
> URL: https://issues.apache.org/jira/browse/YARN-2154
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> Today, FairScheduler uses a spray-gun approach to preemption. Instead, it 
> should only preempt resources that would satisfy the incoming request. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125994#comment-14125994
 ] 

Vinod Kumar Vavilapalli commented on YARN-2080:
---

Some comments on the patch:
 - Configuration
-- admission.enable -> Rename to reservations.enable?
-- RM_SCHEDULER_ENABLE_RESERVATIONS -> RM_RESERVATIONS_ENABLE, 
DEFAULT_RM_SCHEDULER_ENABLE_RESERVATIONS -> DEFAULT_RM_RESERVATIONS_ENABLE
-- reservation.planfollower.time-step -> 
reservation-system.plan-follower.time-step
-- RM_PLANFOLLOWER_TIME_STEP, DEFAULT_RM_PLANFOLLOWER_TIME_STEP -> 
RM_RESERVATION_SYSTEM_PLAN_FOLLOWER_TIME_STEP, 
DEFAULT_RM_RESERVATION_SYSTEM_PLAN_FOLLOWER_TIME_STEP
- A meta question about configuration: It seems like if I pick up a scheduler 
and enable reservations, the system-class, the plan-follower should be picked 
up automatically instead of them being standalone configs. Can we do that? 
Otherwise the following
-- reservation.class -> reservation-system.class? 
-- RM_RESERVATION, DEFAULT_RM_RESERVATION -> RM_RESERVATION_SYSTEM_CLASS, 
DEFAULT_RM_RESERVATION_SYSTEM_CLASS 
-- reservation.plan.follower -> reservation-system.plan-follower
-- RM_RESERVATION_PLAN_FOLLOWER, DEFAULT_RM_RESERVATION_PLAN_FOLLOWER -> 
RM_RESERVATION_SYSTEM_PLAN_FOLLOWER, DEFAULT_RM_RESERVATION_SYSTEM_PLAN_FOLLOWER
 - YarnClient.submitReservation(): We don't return a queue-name anymore after 
the latest YARN-1708? There are javadoc refs to the queue-name being returned.
 - ClientRMService
-- If reservations are not enabled, we get a host of "Reservation is not 
enabled. Please enable & try again" everytime which is not desirable. See 
checkReservationSystem(). This log and a bunch of similar logs in 
ReservationInputValidator may either be (1) deleted or (2) actually belong to 
the audit-log (RMAuditLogger) - we don't need to double-log
-- checkReservationACLs: Today anyone who can submit applications can also 
submit reservations. We may want to separate them, if you agree, I'll file a 
ticket for future separation of these ACLs.
 - AbstractReservationSystem
-- getPlanFollower() -> createPlanFollower()
-- create and init plan-follower should be in serviceInit()?
-- getNewReservationId(): Use ReservationId.newInstance()
 - ReservationInputValidator: Deleting a request shouldn't need 
validateReservationUpdateRequest->validateReservationDefinition. We only need 
the ID validation.
 - CapacitySchedulerConfiguration: I don't understand the semantics of configs 
- average-capacity, reservable.queue, reservation-window, 
reservation-enforcement-window, instantaneous-max-capacity,  - yet as they are 
not used in this patch. Can we drop them (and their setters/getters) here and 
move them to the JIRA that actually uses them?

Tests
 - TestYarnClient: You can use the newInstance methods and avoid using pb 
implementations and the setters directly (for e.g {{new 
ReservationDeleteRequestPBImpl()}}
 - TestClientRMService:
-- ReservationRequest.setLeaseDuration() was renamed to be simply 
setDuration() in YARN-1708. Seems like there are other such occurrences in the 
patch.
-- Similary to TestYarnClient, use record.newInstance methods instead of 
directly invoking PBImpls.

Can't understand CapacityReservationSystem yet as I have to dig into the 
details of YARN-1709.

> Admission Control: Integrate Reservation subsystem with ResourceManager
> ---
>
> Key: YARN-2080
> URL: https://issues.apache.org/jira/browse/YARN-2080
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subramaniam Krishnan
>Assignee: Subramaniam Krishnan
> Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch
>
>
> This JIRA tracks the integration of Reservation subsystem data structures 
> introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
> of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125990#comment-14125990
 ] 

Zhijie Shen commented on YARN-1530:
---

[~sjlee0], thanks for your feedback. Here're some additional thoughts and 
clarifications upon your comments.

bq. This option would make sense only if the imports are less frequent.

To be more specific, I mean sending the same amount of entities (not too big, 
if too big HTTP REST request has to chunk them into some continuous HTTP 
requests with reasonable size) via HTTP REST or HDFS should perform similar. 
HTTP REST may be better because of less secondary storage I/O (ethernet should 
be fast than disk). HTTP REST doesn't prevent the user from batching the 
entities and put them once, and current API supports it. It's up to the user to 
put the entity immediately for realtime/near-realtime inquiry, or to batch 
entities if the can tolerant some delay.

However, I agree HDFS or some other single-node storage technique is a 
interesting part to prevent losing the entities when they are not published to 
the timeline server yet, in particular when we batching them.

bq. Regarding option (2), I think your point is valid that it would be a 
transition from a thin client to a fat client.
bq. However, I'm not too sure if it would make changing the data store much 
more complicated than other scenarios.

I'm also not very sure about the necessary changes. As what I mentioned before, 
timeline server doesn't simply put the entities into the data store. One 
immediate problem I can come up with is the authorization. I'm not sure it's 
going to be logically correct to check the user's access in the client at the 
user's side. If we move authorization to the data store, HBase supports access 
control, but Levedb seems not. And I'm not sure HBase access control is enough 
for timeline sever's specific logic. Still need to think more about it.

As the client is growing fatter, it's difficult to maintain different versions 
of clients. For example, if we do some incompatible optimization for the 
storage schema, only the new client can write into it, while the old client 
will not work any more. Moreover, as most writing logic is conducted at user 
land, which is not predictable, it is likely to raise some unexpected failure 
than a well setup server. In general, I prefer to keep the client simple, such 
that the future client distribution and maintenance could be of less effort.

bq. But then again, if we consider a scenario such as a cluster of ATS 
instances, the same problem exists there.

Right the same problem will exist at the server side, but the web front has 
isolated it from the users. Compared to the clients at the application, the ATS 
instances are a relatively small controllable set that we can pause them and do 
proper upgrading process. How do you think?

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --
>
> Key: YARN-1530
> URL: https://issues.apache.org/jira/browse/YARN-1530
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
> Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
> ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
> application timeline design-20140116.pdf, application timeline 
> design-20140130.pdf, application timeline design-20140210.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework 
> data all by itself as YARN doesn't have a common solution. This JIRA attempts 
> to solve the storage, management and serving of per-framework data from 
> various applications, both running and finished. The aim is to change YARN to 
> collect and store data in a generic manner with plugin points for frameworks 
> to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125959#comment-14125959
 ] 

Jian He commented on YARN-2459:
---

bq. Add one in TestRMRestart to get an app rejected and make sure that the 
final-status gets recorded
Added.
bq. Another one in RMStateStoreTestBase to ensure it is okay to have an 
updateApp call without a storeApp call like in this case.
Turns out RMStateStoreTestBase already has this test.
{code}
// test updating the state of an app/attempt whose initial state was not
// saved.
{code}

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-09-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2459:
--
Attachment: YARN-2459.4.patch

> RM crashes if App gets rejected for any reason and HA is enabled
> 
>
> Key: YARN-2459
> URL: https://issues.apache.org/jira/browse/YARN-2459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
> YARN-2459.4.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory 
> structure but that doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and 
> store. In that case it try to delete this app from store and fails which 
> causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2014-09-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125950#comment-14125950
 ] 

Karthik Kambatla commented on YARN-2154:


Just discussed this with [~ashwinshankar77] offline. He rightly pointed out the 
sort order should take usage into account. I ll post what the order should be, 
as soon as I get to consult my notes. 

> FairScheduler: Improve preemption to preempt only those containers that would 
> satisfy the incoming request
> --
>
> Key: YARN-2154
> URL: https://issues.apache.org/jira/browse/YARN-2154
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> Today, FairScheduler uses a spray-gun approach to preemption. Instead, it 
> should only preempt resources that would satisfy the incoming request. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2518) Support in-process container executor

2014-09-08 Thread BoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125907#comment-14125907
 ] 

BoYang commented on YARN-2518:
--

Yeah, there might be some issues with this, which need to be figure out. Thanks 
Allen for bring it out. I just come with YARN recently and cannot clearly 
identify all potential issues now.

My point is that this in-process container executor seems to be a generic need 
from different people. I kind of see several discussion about this in my 
search. Some uses dummy process (for example, Impala?) as a proxy to relay the 
task to the long running process for further processing.

So if the YARN community can realize the need for this common scenario, bring 
it up for further discussion, and explore the possibilities to support it 
natively, that will be really appreciated. And it will probably benefit a lot 
of other people or projects as well, and make YARN a even more generic 
framework to be adopted more broadly.



> Support in-process container executor
> -
>
> Key: YARN-2518
> URL: https://issues.apache.org/jira/browse/YARN-2518
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.0
> Environment: Linux, Windows
>Reporter: BoYang
>Priority: Minor
>  Labels: container, dispatch, in-process, job, node
>
> Node Manage always creates a new process for a new application. We have hit a 
> scenario where we want the node manager to execute the application inside its 
> own process, so we get fast response time. It would be nice if Node Manager 
> or YARN can provide native support for that.
> In general, the scenario is that we have a long running process which can 
> accept requests and process the requests inside its own process. Since YARN 
> is good at scheduling jobs, we want to use YARN to dispatch jobs (e.g. 
> requests in JSON) to the long running process. In that case, we do not want 
> YARN container to spin up a new process for each request. Instead, we want 
> YARN container to send the request to the long running process for further 
> processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-08 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125908#comment-14125908
 ] 

Sangjin Lee commented on YARN-1530:
---

{quote}
The bottleneck is still there. Essentially I don’t see any difference between 
publishing entities via HTTP REST interface and via HDFS in terms of 
scalability.
{quote}

IMO, option (1) necessarily entails less frequent imports into the store by 
ATS. Obviously, if ATS still imports the HDFS files at the same speed as the 
timeline entries are generated, there would be no difference in scalability. 
This option would make sense only if the imports are less frequent. It also 
would mean that as a trade-off reads would be more stale. I believe Robert's 
document points out all those points.

Regarding option (2), I think your point is valid that it would be a transition 
from a thin client to a fat client. And along with that would be some 
complications as you point out.

However, I'm not too sure if it would make changing the data store much more 
complicated than other scenarios. I think the main problem of switching the 
data store is when not all writers are updated to point to the new data store. 
If writes are in progress, and the clients are being upgraded, there would be 
some inconsistencies between clients that were already upgraded and started 
writing to the new store and those that are not upgraded yet and still writing 
to the old store. If you have a single writer (such as the current ATS design), 
then it would be simpler. But then again, if we consider a scenario such as a 
cluster of ATS instances, the same problem exists there. I think that specific 
problem could be solved by holding the writes in some sort of a backup area 
(e.g. hdfs) before the switch starts, and recovering/re-enabling once all the 
writers are upgraded.

The idea of a cluster of ATS instances (multiple write/read instances) sounds 
interesting. It might be able to address the scalability/reliability problem at 
hand. We'd need to think through and poke holes to see if the idea holds up 
well, however. It would need to address how load balancing would be done and 
whether it would be left up to the user, for example.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --
>
> Key: YARN-1530
> URL: https://issues.apache.org/jira/browse/YARN-1530
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
> Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
> ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
> application timeline design-20140116.pdf, application timeline 
> design-20140130.pdf, application timeline design-20140210.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework 
> data all by itself as YARN doesn't have a common solution. This JIRA attempts 
> to solve the storage, management and serving of per-framework data from 
> various applications, both running and finished. The aim is to change YARN to 
> collect and store data in a generic manner with plugin points for frameworks 
> to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125884#comment-14125884
 ] 

Hudson commented on YARN-2515:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1890 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1890/])
YARN-2515. Updated ConverterUtils#toContainerId to parse epoch. Contributed by 
Tsuyoshi OZAWA (jianhe: rev 0974f434c47ffbf4b77a8478937fd99106c8ddbd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestContainerId.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestConverterUtils.java
* hadoop-yarn-project/CHANGES.txt


> Update ConverterUtils#toContainerId to parse epoch
> --
>
> Key: YARN-2515
> URL: https://issues.apache.org/jira/browse/YARN-2515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.6.0
>
> Attachments: YARN-2515.1.patch, YARN-2515.2.patch
>
>
> ContaienrId#toString was updated on YARN-2182. We should also update 
> ConverterUtils#toContainerId to parse epoch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2512) Allow for origin pattern matching in cross origin filter

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125886#comment-14125886
 ] 

Hudson commented on YARN-2512:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1890 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1890/])
YARN-2512. Allowed pattern matching for origins in CrossOriginFilter. 
Contributed by Jonathan Eagles. (zjshen: rev 
a092cdf32de4d752456286a9f4dda533d8a62bca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java


> Allow for origin pattern matching in cross origin filter
> 
>
> Key: YARN-2512
> URL: https://issues.apache.org/jira/browse/YARN-2512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.6.0
>
> Attachments: YARN-2512-v1.patch
>
>
> Extending the feature set of allowed origins. Now a "*" in a pattern 
> indicates this allowed origin is a pattern and will be matched including 
> multiple sub-domains.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2507) Document Cross Origin Filter Configuration for ATS

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125885#comment-14125885
 ] 

Hudson commented on YARN-2507:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1890 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1890/])
YARN-2507. Documented CrossOriginFilter configurations for the timeline server. 
Contributed by Jonathan Eagles. (zjshen: rev 
56dc496a1031621d2b701801de4ec29179d75f2e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Document Cross Origin Filter Configuration for ATS
> --
>
> Key: YARN-2507
> URL: https://issues.apache.org/jira/browse/YARN-2507
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, timelineserver
>Affects Versions: 2.6.0
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.6.0
>
> Attachments: YARN-2507-v1.patch
>
>
> CORS support was added for ATS as part of YARN-2277. This jira is to document 
> configuration for ATS CORS support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2097) Documentation: health check return status

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125852#comment-14125852
 ] 

Hadoop QA commented on YARN-2097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646615/YARN-2097.1.patch
  against trunk revision 302d9a0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4845//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4845//console

This message is automatically generated.

> Documentation: health check return status
> -
>
> Key: YARN-2097
> URL: https://issues.apache.org/jira/browse/YARN-2097
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Allen Wittenauer
>Assignee: Rekha Joshi
>  Labels: newbie
> Attachments: YARN-2097.1.patch
>
>
> We need to document that the output of the health check script is ignored on 
> non-0 exit status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info

2014-09-08 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125846#comment-14125846
 ] 

Gera Shegalov commented on YARN-2377:
-

[~kasha], do you agree with the points above?

> Localization exception stack traces are not passed as diagnostic info
> -
>
> Key: YARN-2377
> URL: https://issues.apache.org/jira/browse/YARN-2377
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-2377.v01.patch
>
>
> In the Localizer log one can only see this kind of message
> {code}
> 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
> hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
>  1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos 
> tException: ha-nn-uri-0
> {code}
> And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is 
> propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2518) Support in-process container executor

2014-09-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125838#comment-14125838
 ] 

Allen Wittenauer commented on YARN-2518:


Sorry, I wasn't clear:  if this feature goes in, it must fail the nodemanager 
process if security is enabled due to running tasks as the yarn user being 
extremely insecure.

> Support in-process container executor
> -
>
> Key: YARN-2518
> URL: https://issues.apache.org/jira/browse/YARN-2518
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.0
> Environment: Linux, Windows
>Reporter: BoYang
>Priority: Minor
>  Labels: container, dispatch, in-process, job, node
>
> Node Manage always creates a new process for a new application. We have hit a 
> scenario where we want the node manager to execute the application inside its 
> own process, so we get fast response time. It would be nice if Node Manager 
> or YARN can provide native support for that.
> In general, the scenario is that we have a long running process which can 
> accept requests and process the requests inside its own process. Since YARN 
> is good at scheduling jobs, we want to use YARN to dispatch jobs (e.g. 
> requests in JSON) to the long running process. In that case, we do not want 
> YARN container to spin up a new process for each request. Instead, we want 
> YARN container to send the request to the long running process for further 
> processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2518) Support in-process container executor

2014-09-08 Thread BoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125834#comment-14125834
 ] 

BoYang commented on YARN-2518:
--

In my rough testing, it did not fail the node manager process. In my Container 
Executor implementation (launchContainer method), I register a new application 
master, send a message to another long running process, and unregister the 
application master. I can see the application finished successfully.

Of course, that was my very draft initial testing. We could fine-tune the code 
to make it work better. But technically it seems doable now. Thus I am curious 
whether the YARN community could take this feature and provide official support.

> Support in-process container executor
> -
>
> Key: YARN-2518
> URL: https://issues.apache.org/jira/browse/YARN-2518
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.0
> Environment: Linux, Windows
>Reporter: BoYang
>Priority: Minor
>  Labels: container, dispatch, in-process, job, node
>
> Node Manage always creates a new process for a new application. We have hit a 
> scenario where we want the node manager to execute the application inside its 
> own process, so we get fast response time. It would be nice if Node Manager 
> or YARN can provide native support for that.
> In general, the scenario is that we have a long running process which can 
> accept requests and process the requests inside its own process. Since YARN 
> is good at scheduling jobs, we want to use YARN to dispatch jobs (e.g. 
> requests in JSON) to the long running process. In that case, we do not want 
> YARN container to spin up a new process for each request. Instead, we want 
> YARN container to send the request to the long running process for further 
> processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2097) Documentation: health check return status

2014-09-08 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2097:
---
Assignee: Rekha Joshi

> Documentation: health check return status
> -
>
> Key: YARN-2097
> URL: https://issues.apache.org/jira/browse/YARN-2097
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Allen Wittenauer
>Assignee: Rekha Joshi
>  Labels: newbie
> Attachments: YARN-2097.1.patch
>
>
> We need to document that the output of the health check script is ignored on 
> non-0 exit status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2422) yarn.scheduler.maximum-allocation-mb should not be hard-coded in yarn-default.xml

2014-09-08 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2422:
---
Assignee: Gopal V

> yarn.scheduler.maximum-allocation-mb should not be hard-coded in 
> yarn-default.xml
> -
>
> Key: YARN-2422
> URL: https://issues.apache.org/jira/browse/YARN-2422
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: YARN-2422.1.patch
>
>
> Cluster with 40Gb NM refuses to run containers >8Gb.
> It was finally tracked down to yarn-default.xml hard-coding it to 8Gb.
> In case of lack of a better override, it should default to - 
> ${yarn.nodemanager.resource.memory-mb} instead of a hard-coded 8Gb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-08 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2348:
---
Assignee: Leitao Guo

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2256) Too many nodemanager audit logs are generated

2014-09-08 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2256:
---
Assignee: Varun Saxena

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2256) Too many nodemanager audit logs are generated

2014-09-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125808#comment-14125808
 ] 

Allen Wittenauer commented on YARN-2256:


Someone correct me if I'm wrong, but I'm fairly certain that the intent of this 
information is to be the equivalent of the HDFS audit log.  In other words, 
setting these to debug completely defeats the purpose.  Instead, I suspect the 
real culprit is that the log4j settings are wrong for the node manager process.

> Too many nodemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2461) Fix PROCFS_USE_SMAPS_BASED_RSS_ENABLED property in YarnConfiguration

2014-09-08 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125809#comment-14125809
 ] 

Ray Chiang commented on YARN-2461:
--

Same observation as before.  No need for a new unit test for fixed property 
value.

> Fix PROCFS_USE_SMAPS_BASED_RSS_ENABLED property in YarnConfiguration
> 
>
> Key: YARN-2461
> URL: https://issues.apache.org/jira/browse/YARN-2461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-2461-01.patch
>
>
> The property PROCFS_USE_SMAPS_BASED_RSS_ENABLED has an extra period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2518) Support in-process container executor

2014-09-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125797#comment-14125797
 ] 

Allen Wittenauer commented on YARN-2518:


This is pretty much incompatible with security. So it should probably fail the 
nodemanager process under that condition.

> Support in-process container executor
> -
>
> Key: YARN-2518
> URL: https://issues.apache.org/jira/browse/YARN-2518
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.0
> Environment: Linux, Windows
>Reporter: BoYang
>Priority: Minor
>  Labels: container, dispatch, in-process, job, node
>
> Node Manage always creates a new process for a new application. We have hit a 
> scenario where we want the node manager to execute the application inside its 
> own process, so we get fast response time. It would be nice if Node Manager 
> or YARN can provide native support for that.
> In general, the scenario is that we have a long running process which can 
> accept requests and process the requests inside its own process. Since YARN 
> is good at scheduling jobs, we want to use YARN to dispatch jobs (e.g. 
> requests in JSON) to the long running process. In that case, we do not want 
> YARN container to spin up a new process for each request. Instead, we want 
> YARN container to send the request to the long running process for further 
> processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-09-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125777#comment-14125777
 ] 

Zhijie Shen commented on YARN-2517:
---

[~vinodkv], thanks for your feedback. The reason why an async client (or async 
HTTP REST call) is going to be good is to unblock the current thread if it is 
doing the important management logic. For example, in YARN-2033, we have a 
bunch of logic to dispatch the entity putting action on a separate thread, to 
make the application life cycle management move on. Given an async client, it 
could be far more simplified. I think from the user point of view, it may be a 
useful feature as well.

I'm fine whether we can two classes, sync for one and async for the other,  or 
one class for both modes, while the former option complies with the previous 
client design. I think the callback is necessary, at least "onError". 
TimelinePutResponse will give the user a summary of why his uploaded entity is 
not accepted by the timeline server. Based on the response, the user can 
determine whether the app should neglect the problem and move on, or stop 
immediately.

> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-09-08 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125760#comment-14125760
 ] 

Tsuyoshi OZAWA commented on YARN-2517:
--

Thanks for your comment, Vinod.

{quote}
an asynchronous write, the end of which they don't care about. I think we 
should simply have a mode in the existing client to post events asynchronously 
without any further need for call-back handlers.
{quote}

Make sense. We can assure at-most-once semantics without any callbacks. How 
about adding a {{flush()}} API to TimelineClient for asynchronous mode? It 
helps users to know whether contents of current buffer are written to Timeline 
Server or not.

> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch

2014-09-08 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125744#comment-14125744
 ] 

Tsuyoshi OZAWA commented on YARN-2515:
--

Thanks for your review, Jian!

> Update ConverterUtils#toContainerId to parse epoch
> --
>
> Key: YARN-2515
> URL: https://issues.apache.org/jira/browse/YARN-2515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.6.0
>
> Attachments: YARN-2515.1.patch, YARN-2515.2.patch
>
>
> ContaienrId#toString was updated on YARN-2182. We should also update 
> ConverterUtils#toContainerId to parse epoch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125714#comment-14125714
 ] 

Vinod Kumar Vavilapalli commented on YARN-2517:
---

I am not entirely sure we need a parallel client for this. The other clients 
needed async clients because
 - they had loads of functionality that made sense in the blocking and 
non-blocking modes
 - the client code really needed call-back hooks to act on the results.

Timeline Client's only responsibility is to post events. There are only two 
use-cases: Clients need a sync write through, or an asynchronous write, the end 
of which they don't care about. I think we should simply have a mode in the 
existing client to post events asynchronously without any further need for 
call-back handlers.

What do others think?

> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125706#comment-14125706
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667181/YARN-913-002.patch
  against trunk revision 0974f43.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4844//console

This message is automatically generated.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-002.patch

Patch -002

# adds persistence policy
# {{RegistryOperationsService}} implements callbacks for various RM events, and 
implements the setup/purge behaviour underneath.
# adds a new class in the resource manager, {{RegistryService}}. This bridges 
from YARN to the registry by subscribing to application and container events, 
translating and forwarding to the  {{RegistryOperationsService}} where they may 
trigger setup/purge operations 
# Hooks this up to the RM
# Extends the DistributedShell by enabling it to register service records with 
the different persistence options.
# Adds a test to verify the distributed shell does register the entries, and 
that the purgeable ones are purged after the application completes.

This means the {{TestDistributedShell}} test is now capable of verifying that 
YARN applications can register themselves, that they can then be discovered, 
and that the RM cleans up after they terminate.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: yarnregistry.tla
yarnregistry.pdf
2014-09-08_YARN_Service_Registry.pdf

h3. Updated YARN service registry description

This adds a {{persistence}} field to service records, enabling the records to 
be automatically deleted (along with all child entries) when the application, 
app attempt or container is terminated.

h3. TLA+ service registry specification.

This is my initial attempt to define the expected behaviour of a service 
registry built atop zookeeper. Corrections welcome.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125673#comment-14125673
 ] 

Hadoop QA commented on YARN-2494:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667170/YARN-2494.patch
  against trunk revision 0974f43.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/4843//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.label.TestFileSystemNodeLabelManager
  org.apache.hadoop.yarn.label.TestNodeLabelManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4843//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4843//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4843//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4843//console

This message is automatically generated.

> [YARN-796] Node label manager API and storage implementations
> -
>
> Key: YARN-2494
> URL: https://issues.apache.org/jira/browse/YARN-2494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2494.patch
>
>
> This JIRA includes APIs and storage implementations of node label manager,
> NodeLabelManager is an abstract class used to manage labels of nodes in the 
> cluster, it has APIs to query/modify
> - Nodes according to given label
> - Labels according to given hostname
> - Add/remove labels
> - Set labels of nodes in the cluster
> - Persist/recover changes of labels/labels-on-nodes to/from storage
> And it has two implementations to store modifications
> - Memory based storage: It will not persist changes, so all labels will be 
> lost when RM restart
> - FileSystem based storage: It will persist/recover to/from FileSystem (like 
> HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125664#comment-14125664
 ] 

Hadoop QA commented on YARN-2517:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667168/YARN-2517.1.patch
  against trunk revision 0974f43.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4842//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4842//console

This message is automatically generated.

> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2492) (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125659#comment-14125659
 ] 

Wangda Tan commented on YARN-2492:
--

Uploaded 1st version of patch for NodeLabelManager API and implementation to 
YARN-2494, it doesn't rely on YARN-2493. So it can be applied on current trunk 
directly.

> (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests 
> 
>
> Key: YARN-2492
> URL: https://issues.apache.org/jira/browse/YARN-2492
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>
> Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and 
> track sub tasks and attach split patches for YARN-796.
> *Let's still keep over-all discussions on YARN-796.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-09-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2494:
-
Attachment: YARN-2494.patch

Attached patch of NodeLabelManager API and storage implementation. And some PB 
related changes (more than half of the patch).

Please kindly review,
Thanks!

> [YARN-796] Node label manager API and storage implementations
> -
>
> Key: YARN-2494
> URL: https://issues.apache.org/jira/browse/YARN-2494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2494.patch
>
>
> This JIRA includes APIs and storage implementations of node label manager,
> NodeLabelManager is an abstract class used to manage labels of nodes in the 
> cluster, it has APIs to query/modify
> - Nodes according to given label
> - Labels according to given hostname
> - Add/remove labels
> - Set labels of nodes in the cluster
> - Persist/recover changes of labels/labels-on-nodes to/from storage
> And it has two implementations to store modifications
> - Memory based storage: It will not persist changes, so all labels will be 
> lost when RM restart
> - FileSystem based storage: It will persist/recover to/from FileSystem (like 
> HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2517) Implement TimelineClientAsync

2014-09-08 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2517:
-
Attachment: YARN-2517.1.patch

Attached a first patch for review. The differences between TimelineClientAsync 
and TimelineClient are as follows:

* TimelineClientAsyncImpl has 2 blocking queues and 2 threads: {{requestQueue}} 
is for queuing requests from {{TimelineClientAsync#putEntities}}. 
{{responseQueue}} is for queueing responses and errors from 
{{TimelineClientImpl#putEntities}}. {{dispatcherThread}} deques requests from 
{{requestQueue}} and dispatches requests to TimelineServer. {{handlerThread}} 
deques results of {{TimelineClient#putEntities}} and callback user-defined 
methods defined in CallbackHandler.
* CallbackHandler has two APIs for users: onEntitiesPut is a API for receiving 
results of putEntities and onError is a API for handling errors. If 
Configuration#TIMELINE_SERVICE_ENABLED is false, results of putEntities are 
returned via Callback#onEntitiesPut.
* {{void TimelineClientAsync#putEntities}} can throw InterruptedException 
because it uses {{BlockingQueue#put}} in {{TimelineClientAsyncImpl}}, though I 
think it's not blocked basically because the queue length is configured as 
Integer.MAX_VALUE. We can add a configuration for controlling memory 
consumption of the queues.


> Implement TimelineClientAsync
> -
>
> Key: YARN-2517
> URL: https://issues.apache.org/jira/browse/YARN-2517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2517.1.patch
>
>
> In some scenarios, we'd like to put timeline entities in another thread no to 
> block the current one.
> It's good to have a TimelineClientAsync like AMRMClientAsync and 
> NMClientAsync. It can buffer entities, put them in a separate thread, and 
> have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2512) Allow for origin pattern matching in cross origin filter

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125511#comment-14125511
 ] 

Hudson commented on YARN-2512:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1865 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1865/])
YARN-2512. Allowed pattern matching for origins in CrossOriginFilter. 
Contributed by Jonathan Eagles. (zjshen: rev 
a092cdf32de4d752456286a9f4dda533d8a62bca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java


> Allow for origin pattern matching in cross origin filter
> 
>
> Key: YARN-2512
> URL: https://issues.apache.org/jira/browse/YARN-2512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.6.0
>
> Attachments: YARN-2512-v1.patch
>
>
> Extending the feature set of allowed origins. Now a "*" in a pattern 
> indicates this allowed origin is a pattern and will be matched including 
> multiple sub-domains.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2507) Document Cross Origin Filter Configuration for ATS

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125510#comment-14125510
 ] 

Hudson commented on YARN-2507:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1865 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1865/])
YARN-2507. Documented CrossOriginFilter configurations for the timeline server. 
Contributed by Jonathan Eagles. (zjshen: rev 
56dc496a1031621d2b701801de4ec29179d75f2e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Document Cross Origin Filter Configuration for ATS
> --
>
> Key: YARN-2507
> URL: https://issues.apache.org/jira/browse/YARN-2507
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, timelineserver
>Affects Versions: 2.6.0
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.6.0
>
> Attachments: YARN-2507-v1.patch
>
>
> CORS support was added for ATS as part of YARN-2277. This jira is to document 
> configuration for ATS CORS support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125509#comment-14125509
 ] 

Hudson commented on YARN-2515:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1865 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1865/])
YARN-2515. Updated ConverterUtils#toContainerId to parse epoch. Contributed by 
Tsuyoshi OZAWA (jianhe: rev 0974f434c47ffbf4b77a8478937fd99106c8ddbd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestContainerId.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java


> Update ConverterUtils#toContainerId to parse epoch
> --
>
> Key: YARN-2515
> URL: https://issues.apache.org/jira/browse/YARN-2515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.6.0
>
> Attachments: YARN-2515.1.patch, YARN-2515.2.patch
>
>
> ContaienrId#toString was updated on YARN-2182. We should also update 
> ConverterUtils#toContainerId to parse epoch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2507) Document Cross Origin Filter Configuration for ATS

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125419#comment-14125419
 ] 

Hudson commented on YARN-2507:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #674 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/674/])
YARN-2507. Documented CrossOriginFilter configurations for the timeline server. 
Contributed by Jonathan Eagles. (zjshen: rev 
56dc496a1031621d2b701801de4ec29179d75f2e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Document Cross Origin Filter Configuration for ATS
> --
>
> Key: YARN-2507
> URL: https://issues.apache.org/jira/browse/YARN-2507
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, timelineserver
>Affects Versions: 2.6.0
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.6.0
>
> Attachments: YARN-2507-v1.patch
>
>
> CORS support was added for ATS as part of YARN-2277. This jira is to document 
> configuration for ATS CORS support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125418#comment-14125418
 ] 

Hudson commented on YARN-2515:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #674 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/674/])
YARN-2515. Updated ConverterUtils#toContainerId to parse epoch. Contributed by 
Tsuyoshi OZAWA (jianhe: rev 0974f434c47ffbf4b77a8478937fd99106c8ddbd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestContainerId.java


> Update ConverterUtils#toContainerId to parse epoch
> --
>
> Key: YARN-2515
> URL: https://issues.apache.org/jira/browse/YARN-2515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Fix For: 2.6.0
>
> Attachments: YARN-2515.1.patch, YARN-2515.2.patch
>
>
> ContaienrId#toString was updated on YARN-2182. We should also update 
> ConverterUtils#toContainerId to parse epoch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2512) Allow for origin pattern matching in cross origin filter

2014-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125420#comment-14125420
 ] 

Hudson commented on YARN-2512:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #674 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/674/])
YARN-2512. Allowed pattern matching for origins in CrossOriginFilter. 
Contributed by Jonathan Eagles. (zjshen: rev 
a092cdf32de4d752456286a9f4dda533d8a62bca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java
* hadoop-yarn-project/CHANGES.txt


> Allow for origin pattern matching in cross origin filter
> 
>
> Key: YARN-2512
> URL: https://issues.apache.org/jira/browse/YARN-2512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.6.0
>
> Attachments: YARN-2512-v1.patch
>
>
> Extending the feature set of allowed origins. Now a "*" in a pattern 
> indicates this allowed origin is a pattern and will be matched including 
> multiple sub-domains.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)