date:20150729


 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2768:
---
Summary: Avoid cloning Resource in FSAppAttempt#updateDemand  (was: Avoid 
cloning Resource in FSAppAttempt.updateDemand)

 Avoid cloning Resource in FSAppAttempt#updateDemand
 ---

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2768) Avoid cloning Resource in FSAppAttempt.updateDemand


 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2768:
---
Summary: Avoid cloning Resource in FSAppAttempt.updateDemand  (was: 
optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
of computing time of update thread)

 Avoid cloning Resource in FSAppAttempt.updateDemand
 ---

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start


 [ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3919:

Attachment: 0003-YARN-3919.patch

The current patch not able to apply in my machine. So regenerating the same 
patches from my machine.  Uploading to HadoopQA to kick off before commmit..

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell

2015-07-29 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646344#comment-14646344
 ] 

Robert Kanter commented on YARN-3950:
-

Thanks!

 Add unique SHELL_ID environment variable to DistributedShell
 

 Key: YARN-3950
 URL: https://issues.apache.org/jira/browse/YARN-3950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.8.0

 Attachments: YARN-3950.001.patch, YARN-3950.002.patch


 As discussed in [this 
 comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027],
  it would be useful to have a monotonically increasing and independent ID of 
 some kind that is unique per shell in the distributed shell program.
 We can do that by adding a SHELL_ID env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3989) Show messages only for NodeLabel commmands in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646361#comment-14646361
 ] 

Bibin A Chundatt commented on YARN-3989:


[~sunilg] and [~Naganarasimha]

Currently only NodeLabel i would like to put in scope for this Jira.Other 
commands makes it more complicated. 
Few commands as [~sunilg]  pointed out.

# addToCluserNodeLabels
# removeFromClusterNodeLabels
# replaceLabelsOnNode

Its a huge stack trace which is shown in console and from trace it becomes very 
difficult to get the actual error message.Most of the cases only direct 
messages are intended for users not full stack trace.

So as of now will define scope only to Nodelabel commands in this jira.

 Show messages only for NodeLabel commmands in RMAdminCLI
 

 Key: YARN-3989
 URL: https://issues.apache.org/jira/browse/YARN-3989
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor

 Currently for nodelabel command execution failure  full exception stacktrace 
 is shown. This jira is to handle exceptions and show only messages.
 As per the discussion in YARN-3963
 [~sunilg]
 {quote}
 As I see it, I can see full exception stack trace in client side in this case 
 (also in case of other commands too) and its too verbose. I think we can make 
 its compact and n it will be more easily readable.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread


[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646374#comment-14646374
 ] 

Karthik Kambatla commented on YARN-2768:


+1

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2768) Avoid cloning Resource in FSAppAttempt#updateDemand


 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2768.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0

Just committed to trunk and branch-2. Thanks [~zhiguohong] for the 
contribution. 

 Avoid cloning Resource in FSAppAttempt#updateDemand
 ---

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646465#comment-14646465
 ] 

Li Lu commented on YARN-3049:
-

Thanks [~zjshen]! For now I think it's fine to include the changes on app2flow 
table. I'll take a look at your latest patch. 

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
 YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1643) Make ContainersMonitor can support change monitoring size of an allocated container in NM side

2015-07-29 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646403#comment-14646403
 ] 

MENG DING commented on YARN-1643:
-

Thanks [~jianhe] for the review:

bq. Here, memory * 2^20, but it gets reverted later on at info.pmemLimit  20, 
we can just use the original value ?
Will do.

bq. Do you think we can change the trackingContainers to be concurrentHashMap 
and update the ptInfo directly ? Also the getter and setter of ptInfo can 
synchronize on the ptInfo object
Yes, we can make {{trackingContainers}} a {{ConcurrentHashMap}}, and add setter 
to {{ProcessTreeInfo}} for vemLimit, pmemLimit, and cpuVcores, then have the 
getter and setter synchronized on the object. 

IIUC, the main benefit is that we don't need to synchronize on the 
{{enforceResourceLimits}} call, which can be heavy, right? If that is the case, 
we probably also need to have proper synchronization for 
{{ResourceCalculatorProcessTree}}, e.g., 
{{ProcfsBasedProcessTree}}/{{WindowsBasedProcessTree}}? These objects could be 
updated by multiple threads as well. I was afraid that the code change may be 
too much?

For other objects like {{containersToBeChanged}}, {{containersToBeAdded}}, 
{{containersToBeRemoved}}, I think we still need to synchronize on the entire 
map like the way it is right now, because we are calling  functions like 
{{containersToBeRemoved.clear()}}.

Thoughts?

 Make ContainersMonitor can support change monitoring size of an allocated 
 container in NM side
 --

 Key: YARN-1643
 URL: https://issues.apache.org/jira/browse/YARN-1643
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1643-YARN-1197.4.patch, 
 YARN-1643-YARN-1197.5.patch, YARN-1643-YARN-1197.6.patch, YARN-1643.1.patch, 
 YARN-1643.2.patch, YARN-1643.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3991) Investigate if we need an atomic way to set both memory and CPU on Resource

2015-07-29 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3991:
--

Assignee: Varun Saxena

 Investigate if we need an atomic way to set both memory and CPU on Resource
 ---

 Key: YARN-3991
 URL: https://issues.apache.org/jira/browse/YARN-3991
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
Assignee: Varun Saxena
  Labels: capacityscheduler, fairscheduler, scheduler

 While reviewing another patch, noticed that we have independent methods to 
 set memory and CPU. 
 Do we need another method to set them both atomically? Otherwise, would two 
 threads trying to set both values lose any updates? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646454#comment-14646454
 ] 

Sunil G commented on YARN-3992:
---

Thank you [~zjshen] for reporting the same. I will take a look into this.

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G

 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646339#comment-14646339
 ] 

Zhijie Shen commented on YARN-3049:
---

TestApplicationPriority.testApplicationPriorityAllocation seems to have a race 
condition issue. I cannot reproduce it locally both on trunk or with on 
YARN-2928 with this patch. Anyway, it seems not to be related to this jira. 
Will file a separate Jira to track the test failure.

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
 YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646443#comment-14646443
 ] 

Zhijie Shen commented on YARN-3992:
---

The problem was found with jenkins build on YARN-3049: 
https://builds.apache.org/job/PreCommit-YARN-Build/8701/testReport/

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen

 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646452#comment-14646452
 ] 

Sunil G commented on YARN-3992:
---

Thank you [~zhi

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G

 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2768) Avoid cloning Resource in FSAppAttempt#updateDemand

2015-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646401#comment-14646401
 ] 

Hudson commented on YARN-2768:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8241 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8241/])
YARN-2768. Avoid cloning Resource in FSAppAttempt#updateDemand. (Hong Zhiguo 
via kasha) (kasha: rev 5205a330b387d2e133ee790b9fe7d5af3cd8bccc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java
* hadoop-yarn-project/CHANGES.txt


 Avoid cloning Resource in FSAppAttempt#updateDemand
 ---

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell

2015-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646268#comment-14646268
 ] 

Hudson commented on YARN-3950:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8239 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8239/])
YARN-3950. Add unique SHELL_ID environment variable to DistributedShell. 
Contributed by Robert Kanter (jlowe: rev 
2b2bd9214604bc2e14e41e08d30bf86f512151bd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDSAppMaster.java


 Add unique SHELL_ID environment variable to DistributedShell
 

 Key: YARN-3950
 URL: https://issues.apache.org/jira/browse/YARN-3950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.8.0

 Attachments: YARN-3950.001.patch, YARN-3950.002.patch


 As discussed in [this 
 comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027],
  it would be useful to have a monotonically increasing and independent ID of 
 some kind that is unique per shell in the distributed shell program.
 We can do that by adding a SHELL_ID env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3950) Add unique YARN_SHELL_ID environment variable to DistributedShell

2015-07-29 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3950:
---
Summary: Add unique YARN_SHELL_ID environment variable to DistributedShell  
(was: Add unique SHELL_ID environment variable to DistributedShell)

 Add unique YARN_SHELL_ID environment variable to DistributedShell
 -

 Key: YARN-3950
 URL: https://issues.apache.org/jira/browse/YARN-3950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.8.0

 Attachments: YARN-3950.001.patch, YARN-3950.002.patch


 As discussed in [this 
 comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027],
  it would be useful to have a monotonically increasing and independent ID of 
 some kind that is unique per shell in the distributed shell program.
 We can do that by adding a SHELL_ID env var.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3991) Investigate if we need an atomic way to set both memory and CPU on Resource

Karthik Kambatla created YARN-3991:
--

 Summary: Investigate if we need an atomic way to set both memory 
and CPU on Resource
 Key: YARN-3991
 URL: https://issues.apache.org/jira/browse/YARN-3991
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Karthik Kambatla


While reviewing another patch, noticed that we have independent methods to set 
memory and CPU. 

Do we need another method to set them both atomically? Otherwise, would two 
threads trying to set both values lose any updates? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3991) Investigate if we need an atomic way to set both memory and CPU on Resource


 [ 
https://issues.apache.org/jira/browse/YARN-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3991:
---
Labels: capacityscheduler fairscheduler scheduler  (was: )

 Investigate if we need an atomic way to set both memory and CPU on Resource
 ---

 Key: YARN-3991
 URL: https://issues.apache.org/jira/browse/YARN-3991
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
  Labels: capacityscheduler, fairscheduler, scheduler

 While reviewing another patch, noticed that we have independent methods to 
 set memory and CPU. 
 Do we need another method to set them both atomically? Otherwise, would two 
 threads trying to set both values lose any updates? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start

2015-07-29 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646415#comment-14646415
 ] 

Varun Saxena commented on YARN-3919:


[~rohithsharma], are you using {{patch -p0}} command ?

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start


[ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646448#comment-14646448
 ] 

Rohith Sharma K S commented on YARN-3919:
-

No... git apply --whitespace=fix  patch-file

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected

2015-07-29 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646449#comment-14646449
 ] 

zhihai xu commented on YARN-3990:
-

Yes, that is a good catch!
{{rmContext.getRMApps()}} stores both completed and running APPs, Current 
default value for max-completed-applications is 1, we may save up-to 1 
NodeUpdateEvent.
+1 too.


 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3990.patch


 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

Zhijie Shen created YARN-3992:
-

 Summary: TestApplicationPriority.testApplicationPriorityAllocation 
fails intermittently
 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
java.lang.AssertionError: expected:7 but was:5
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


 [ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-3992:
-

Assignee: Sunil G

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G

 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start


[ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646501#comment-14646501
 ] 

Hadoop QA commented on YARN-3919:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |  10m 22s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 28s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  5s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 44s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 12s | Tests passed in 
hadoop-yarn-common. |
| | |  51m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747812/0003-YARN-3919.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 5205a33 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8706/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8706/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8706/console |


This message was automatically generated.

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at

[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646566#comment-14646566
 ] 

Anubhav Dhoot commented on YARN-3990:
-

Change looks good. It would be good to have a unit test to catch regressions.

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3990.patch


 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic


[ 
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646835#comment-14646835
 ] 

Hadoop QA commented on YARN-3983:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 54s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  3s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 24s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  8s | The applied patch generated  
41 new checkstyle issues (total was 54, now 66). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 16  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m  0s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 101m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747851/YARN-3983.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c020b62 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8709/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8709/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8709/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8709/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8709/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8709/console |


This message was automatically generated.

 Make CapacityScheduler to easier extend application allocation logic
 

 Key: YARN-3983
 URL: https://issues.apache.org/jira/browse/YARN-3983
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3983.1.patch


 While working on YARN-1651 (resource allocation for increasing container), I 
 found it is very hard to extend existing CapacityScheduler resource 
 allocation logic to support different types of resource allocation.
 For example, there's a lot of differences between increasing a container and 
 allocating a container:
 - Increasing a container doesn't need to check locality delay.
 - Increasing a container doesn't need to build/modify a resource request tree 
 (ANY-RACK/HOST).
 - Increasing a container doesn't need to check allocation/reservation 
 starvation (see {{shouldAllocOrReserveNewContainer}}).
 - After increasing a container is approved by scheduler, it need to update an 
 existing container token instead of creating new container.
 And there're lots of similarities when allocating different types of 
 resources.
 - User-limit/queue-limit will be enforced for both of them.
 - Both of them needs resource reservation logic. (Maybe continuous 
 reservation looking is needed for both of them).
 The purpose of this JIRA is to make easier extending CapacityScheduler 
 resource allocation logic to support different types of resource allocation, 
 make common code reusable, and also better code organization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3994) RM should respect AM resource/placement constraints


[ 
https://issues.apache.org/jira/browse/YARN-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646756#comment-14646756
 ] 

Anubhav Dhoot commented on YARN-3994:
-

This jira should incorporate support for blacklisting done in this related jira

 RM should respect AM resource/placement constraints
 ---

 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha

 Today, locality and cpu for the AM can be specified in the AM launch 
 container request but are ignored at the RM. Locality is assumed to be ANY 
 and cpu is dropped. There may be other things too that are ignored. This 
 should be fixed so that the user gets what is specified in their code to 
 launch the AM. cc [~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1643) Make ContainersMonitor can support change monitoring size of an allocated container in NM side

2015-07-29 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646709#comment-14646709
 ] 

MENG DING commented on YARN-1643:
-

bq. IIUC, the main benefit is that we don't need to synchronize on the 
enforceResourceLimits call, which can be heavy, right? If that is the case, we 
probably also need to have proper synchronization for 
ResourceCalculatorProcessTree, e.g., 
ProcfsBasedProcessTree/WindowsBasedProcessTree? These objects could be updated 
by multiple threads as well. I was afraid that the code change may be too much?

I think I find a way without having to synchronize on 
{{ResourceCalculatorProcessTree}}. All that is needed for synchronization in 
this class is the trackingContainers map and the access to the 
vmemLimit/pmemLimit/cpuVcores fields. The actual resource limit enforcement can 
still be handled in the {{MonitoringThread.run}} thread only.

 Make ContainersMonitor can support change monitoring size of an allocated 
 container in NM side
 --

 Key: YARN-1643
 URL: https://issues.apache.org/jira/browse/YARN-1643
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Wangda Tan
Assignee: MENG DING
 Attachments: YARN-1643-YARN-1197.4.patch, 
 YARN-1643-YARN-1197.5.patch, YARN-1643-YARN-1197.6.patch, YARN-1643.1.patch, 
 YARN-1643.2.patch, YARN-1643.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3974) Refactor the reservation system test cases to use parameterized base test

2015-07-29 Thread Subru Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3974:
-
Attachment: YARN-3974-v4.patch

Uploading an updated patch that fixes the checkstyle issue

 Refactor the reservation system test cases to use parameterized base test
 -

 Key: YARN-3974
 URL: https://issues.apache.org/jira/browse/YARN-3974
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: YARN-3974-v1.patch, YARN-3974-v2.patch, 
 YARN-3974-v3.patch, YARN-3974-v4.patch


 We have two test suites for testing ReservationSystem against Capacity  Fair 
 scheduler. We should combine them using a parametrized reservation system 
 base test similar to YARN-2797



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3994) RM should respect AM resource/placement constraints

2015-07-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-3994:
---

Assignee: Varun Vasudev

 RM should respect AM resource/placement constraints
 ---

 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Varun Vasudev

 Today, locality and cpu for the AM can be specified in the AM launch 
 container request but are ignored at the RM. Locality is assumed to be ANY 
 and cpu is dropped. There may be other things too that are ignored. This 
 should be fixed so that the user gets what is specified in their code to 
 launch the AM. cc [~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646510#comment-14646510
 ] 

Rohith Sharma K S commented on YARN-3979:
-

Oops, 50 lakh events 
I checked the attached logs, since you have attached only ERROR logs, did not 
able to trace it. One observation is there are many InvalidStateTransitions 
events CLEAN_UP  in RMNodeImpl. 
# Would you possible give RM logs, if not able to attach  to JIRA, could you 
send me through mail. 
# would give more info like what is the cluster size? how much is apps are 
running? how many were completed? What is the state of state of NodeManager i.e 
whether they are running OR any other state? Which version  of Hadoop are you 
using?

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

Zhijie Shen created YARN-3993:
-

 Summary: Change to use the AM flag in ContainerContext determine 
AM container
 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen


After YARN-3116, we will have a flag in ContainerContext to determine if the 
container is AM or not in aux service. We need to change accordingly to make 
use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start


 [ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3919:

Priority: Trivial  (was: Major)

 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Trivial
 Fix For: 2.8.0

 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646559#comment-14646559
 ] 

Sunil G commented on YARN-3993:
---

OK. I understood slightly different. With the new changes from YARN-3116, we 
would like to change to know a container is AM or not based on type (not by the 
existing way of using container ID). If its fine, I can look into this.

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
  Labels: newbie

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container


 [ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3993:
--
Labels: newbie  (was: )

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
  Labels: newbie

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646558#comment-14646558
 ] 

Zhijie Shen edited comment on YARN-3993 at 7/29/15 6:26 PM:


[~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we 
already build the channel to propagate the AM flag to aux service. What we need 
to do here is simply update the way that PerNodeTimelineCollectorsAuxService 
determine if the container is AM or not. Feel free to pick it up if you want to 
ramp up with TS v2.


was (Author: zjshen):
[~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we 
already build the channel to propagate the AM flag to aux service. What we need 
to do here is simply update the way that PerNodeTimelineCollectorsAuxService 
determine if the container is AM or not.

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
  Labels: newbie

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3994) RM should respect AM resource/placement constraints


[ 
https://issues.apache.org/jira/browse/YARN-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646707#comment-14646707
 ] 

Wangda Tan commented on YARN-3994:
--

+1 for this, we should target this to 2.8.

 RM should respect AM resource/placement constraints
 ---

 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha

 Today, locality and cpu for the AM can be specified in the AM launch 
 container request but are ignored at the RM. Locality is assumed to be ANY 
 and cpu is dropped. There may be other things too that are ignored. This 
 should be fixed so that the user gets what is specified in their code to 
 launch the AM. cc [~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646490#comment-14646490
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Adding to User API discussion, 
the ApplicationCLI command can be {{./yarn application appId --set-priority 
ApplicationId --priority value}}

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3919) NPEs' while stopping service after exception during CommonNodeLabelsManager#start

2015-07-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646540#comment-14646540
 ] 

Hudson commented on YARN-3919:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8242 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8242/])
YARN-3919. NPEs' while stopping service after exception during 
CommonNodeLabelsManager#start. (varun saxena via rohithsharmaks) 
(rohithsharmaks: rev c020b62cf8de1f3baadc9d2f3410640ef7880543)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java


 NPEs' while stopping service after exception during 
 CommonNodeLabelsManager#start
 -

 Key: YARN-3919
 URL: https://issues.apache.org/jira/browse/YARN-3919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: 0003-YARN-3919.patch, YARN-3919.01.patch, 
 YARN-3919.02.patch


 We get NPE during CommonNodeLabelsManager#serviceStop and 
 AsyncDispatcher#serviceStop if ConnectException on call to 
 CommonNodeLabelsManager#serviceStart occurs.
 {noformat}
 2015-07-10 19:39:37,825 WARN main-EventThread 
 org.apache.hadoop.service.AbstractService: When stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.close(FileSystemNodeLabelsStore.java:99)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:278)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:998)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1039)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
 {noformat}
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 at 
 org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
 at 
 org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
 at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
 at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
 at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
 {noformat}
 These NPEs' fill up the logs. Although, this doesn't cause any functional 
 issue but its a nuisance and we ideally should have null checks in 
 serviceStop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.

2015-07-29 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646556#comment-14646556
 ] 

Xuan Gong commented on YARN-3543:
-

Sorry for the late. The patch looks good overall. But we still made some 
un-necessary changes.
* Changes made for RM side look good.
* Changes on ATS side, I think that we could follow the changes from YARN-1462, 
which will include:
** ApplicationHistoryManagerOnTimelineStore
** TestApplicationHistoryManagerOnTimelineStore
** ApplicationMetricsConstants
** ApplicationCreatedEvent
** SystemMetricsPublisher
** TestSystemMetricsPublisher
** TimelineServer

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3994) RM should respect AM resource/placement constraints


 [ 
https://issues.apache.org/jira/browse/YARN-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3994:
-
Target Version/s: 2.8.0

 RM should respect AM resource/placement constraints
 ---

 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha

 Today, locality and cpu for the AM can be specified in the AM launch 
 container request but are ignored at the RM. Locality is assumed to be ANY 
 and cpu is dropped. There may be other things too that are ignored. This 
 should be fixed so that the user gets what is specified in their code to 
 launch the AM. cc [~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime

2015-07-29 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646676#comment-14646676
 ] 

Jian He commented on YARN-3887:
---

sounds good to me

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container


 [ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-3993:
-

Assignee: Sunil G

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Sunil G
  Labels: newbie

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.


[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646598#comment-14646598
 ] 

Rohith Sharma K S commented on YARN-3543:
-

I got what you mean!! Right.. Modifying other files like *ApplicationStartData* 
and others are related to applicationhistoryservice I think. Is it so?

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS


[ 
https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646679#comment-14646679
 ] 

Hadoop QA commented on YARN-3978:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 48s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   9m 21s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 58s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  54m 33s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 105m 54s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747634/YARN-3978.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c020b62 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8707/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8707/console |


This message was automatically generated.

 Configurably turn off the saving of container info in Generic AHS
 -

 Key: YARN-3978
 URL: https://issues.apache.org/jira/browse/YARN-3978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver, yarn
Affects Versions: 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: YARN-3978.001.patch, YARN-3978.002.patch


 Depending on how each application's metadata is stored, one week's worth of 
 data stored in the Generic Application History Server's database can grow to 
 be almost a terabyte of local disk space. In order to alleviate this, I 
 suggest that there is a need for a configuration option to turn off saving of 
 non-AM container metadata in the GAHS data store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs

2015-07-29 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646700#comment-14646700
 ] 

Bikas Saha commented on YARN-2005:
--

I am fine for opening a separate jira for the specific case I mentioned. Opened 
YARN-3994 for that. If you want you can extend its scope to blacklisting.

 Blacklisting support for scheduling AMs
 ---

 Key: YARN-2005
 URL: https://issues.apache.org/jira/browse/YARN-2005
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Anubhav Dhoot
 Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
 YARN-2005.003.patch, YARN-2005.004.patch


 It would be nice if the RM supported blacklisting a node for an AM launch 
 after the same node fails a configurable number of AM attempts.  This would 
 be similar to the blacklisting support for scheduling task attempts in the 
 MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3994) RM should respect AM resource/placement constraints

2015-07-29 Thread Bikas Saha (JIRA)

Bikas Saha created YARN-3994:


 Summary: RM should respect AM resource/placement constraints
 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha


Today, locality and cpu for the AM can be specified in the AM launch container 
request but are ignored at the RM. Locality is assumed to be ANY and cpu is 
dropped. There may be other things too that are ignored. This should be fixed 
so that the user gets what is specified in their code to launch the AM. cc 
[~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646532#comment-14646532
 ] 

Sunil G commented on YARN-3993:
---

Hi [~zjshen]
In SchedulerApplicationAttempt,we can use RMContainer#isAMContainer() api to 
know that. Its bee done this as per f YARN-2022. Cud I take over this.

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


 [ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3992:
--
Attachment: 0001-YARN-3992.patch

Thank you [~zjshen].  It seems we were not waiting for full containers to get 
allocated. I have now updated code so that we wait for all containers to get 
allocated. 

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs


[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646612#comment-14646612
 ] 

Anubhav Dhoot commented on YARN-2005:
-

I think blacklisting can have lots of policies and constraints and will 
probably change over time. Since RMAppAttemptImpl#ScheduleTransition drops the 
locality constraint it seems ok for the current blacklisting to also be 
locality constraint unaware. Should we start simple and keep a separate jira 
for honoring am locality in scheduling and blacklisting at the same time?   
[~jianhe],[~bikassaha] let me know if you agree and I can file that jira. 

 Blacklisting support for scheduling AMs
 ---

 Key: YARN-2005
 URL: https://issues.apache.org/jira/browse/YARN-2005
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Anubhav Dhoot
 Attachments: YARN-2005.001.patch, YARN-2005.002.patch, 
 YARN-2005.003.patch, YARN-2005.004.patch


 It would be nice if the RM supported blacklisting a node for an AM launch 
 after the same node fails a configurable number of AM attempts.  This would 
 be similar to the blacklisting support for scheduling task attempts in the 
 MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646675#comment-14646675
 ] 

Hadoop QA commented on YARN-3992:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   6m 19s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 11s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 29s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m  7s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  71m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747833/0001-YARN-3992.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / c020b62 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8708/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8708/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8708/console |


This message was automatically generated.

 TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
 --

 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen
Assignee: Sunil G
 Attachments: 0001-YARN-3992.patch


 {code}
 java.lang.AssertionError: expected:7 but was:5
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-07-29 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3993:
--
Affects Version/s: YARN-2928

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Zhijie Shen
Assignee: Sunil G
  Labels: newbie

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic

[
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wangda Tan updated YARN-3983:
-
Attachment: YARN-3983.1.patch

Attached initial patch for review.

Make CapacityScheduler to easier extend application allocation logic

Key: YARN-3983
URL: https://issues.apache.org/jira/browse/YARN-3983
Project: Hadoop YARN
Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-3983.1.patch

While working on YARN-1651 (resource allocation for increasing container), I
found it is very hard to extend existing CapacityScheduler resource
allocation logic to support different types of resource allocation.
For example, there's a lot of differences between increasing a container and
allocating a container:
- Increasing a container doesn't need to check locality delay.
- Increasing a container doesn't need to build/modify a resource request tree
(ANY-RACK/HOST).
- Increasing a container doesn't need to check allocation/reservation
starvation (see {{shouldAllocOrReserveNewContainer}}).
- After increasing a container is approved by scheduler, it need to update an
existing container token instead of creating new container.
And there're lots of similarities when allocating different types of
resources.
- User-limit/queue-limit will be enforced for both of them.
- Both of them needs resource reservation logic. (Maybe continuous
reservation looking is needed for both of them).
The purpose of this JIRA is to make easier extending CapacityScheduler
resource allocation logic to support different types of resource allocation,
make common code reusable, and also better code organization.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority

[
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646478#comment-14646478
]

Rohith Sharma K S commented on YARN-3250:
-

Hi [~sunilg]
As part of this JIRA,
# User API :
## I am planning to introduce
{{ApplicationClientProtocol#setPriority(SetApplicationProrityRequest)}}.
*SetApplicationProrityRequest* comprises of ApplicationId and Priority. The
clientRMService invokes API introduced by YARN-3887 i.e.
updateApplicationPriority();
## Thinking that does getPriority is required at user side? I feel that, since
ApplicationReport can gives the priority of an application, this API is NOT
required to have. What do you suggests, any thoughts?

# Admin API :
## As admin, he should be able to change the *cluster-max-application-priority*
value. Having an rmadmin API would be great!!. But one issue in with api is
that cluster-max-application-priority is inmemory, but when rmadmin updates it,
inmemory value can be updated. But in HA/Restart cases, the configuration
value is taken. So I suggest to store cluster-level-application-priority in
store and whenever RM is switched/Restarted, give higher preference to store.
What do you think about this approach?

Apart from above API's , should there any new API's to be added? Kindly share
your thoughts?

Support admin/user cli interface in for Application Priority

Key: YARN-3250
URL: https://issues.apache.org/jira/browse/YARN-3250
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

Current Application Priority Manager supports only configuration via file.
To support runtime configurations for admin cli and REST, a common management
interface has to be added which can be shared with NodeLabelsManager.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.


[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646568#comment-14646568
 ] 

Rohith Sharma K S commented on YARN-3543:
-

Thanks [~xgong] for review..
bq. But we still made some un-necessary changes.
sorry could not get what are un necessary changes. Could you explain please?

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart


[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646668#comment-14646668
 ] 

Naganarasimha G R commented on YARN-3127:
-

Hi [~xgong], [~gtCarrera]  [~ozawa],
Can any one you have a look at this jira ? 



 Avoid timeline events during RM recovery or restart
 ---

 Key: YARN-3127
 URL: https://issues.apache.org/jira/browse/YARN-3127
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0, 2.7.1
 Environment: RM HA with ATS
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: AppTransition.png, YARN-3127.20150213-1.patch, 
 YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch


 1.Start RM with HA and ATS configured and run some yarn applications
 2.Once applications are finished sucessfully start timeline server
 3.Now failover HA form active to standby
 4.Access timeline server URL IP:PORT/applicationhistory
 //Note Earlier exception was thrown when accessed. 
 Incomplete information is shown in the ATS web UI. i.e. attempt container and 
 other information is not displayed.
 Also even if timeline server is started with RM, and on RM restart/ recovery 
 ATS events for the applications already existing in ATS are resent which is 
 not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646495#comment-14646495
 ] 

Rohith Sharma K S commented on YARN-3250:
-

small correction in above syntax. Correct syntax is {{./yarn application 
--set-priority ApplicationId --priority value}}

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin/user cli interface in for Application Priority


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646518#comment-14646518
 ] 

Sunil G commented on YARN-3250:
---

Hi [~rohithsharma]
Thank you for bringing up  with api suggestions. I have few comments.

bq.ApplicationClientProtocol#setPriority(SetApplicationProrityRequest)
Could we use api name as {{setApplicationPriority}}

bq. I suggest to store cluster-level-application-priority in store and whenever 
RM is switched/Restarted, give higher preference to store.
I think this is a known design dilema we have in Yarn now. Once a centralized 
config tickets are done, we can have a clear solution. I am fine with having a 
priority given to RMStateStore over config file during restart. If there are no 
configuration changes, we can use value from yarn-site.xml. How will be the 
storage location path for this cluster-application-priority. I think we can 
group under cluster level so in future common other cluster configs can be 
placed if needed.

bq.Apart from above API's , should there any new API's to be added? 
We can change default priority of a queue by changing capacity-scheduler.xml 
and call refreshQueues. I feel we may not need a command for that now.


bq../yarn application -set-priority ApplicationId --priority value
I feel we can have {{./yarn application --setPriority ApplicationId 
--priority value}}
I was trying to sync with existing application commands {{-appStates}} 
{{-appTypes}}

cc/[~jianhe] [~leftnoteasy] Please share your thoughts.

 Support admin/user cli interface in for Application Priority
 

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S

 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container


[ 
https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646558#comment-14646558
 ] 

Zhijie Shen commented on YARN-3993:
---

[~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we 
already build the channel to propagate the AM flag to aux service. What we need 
to do here is simply update the way that PerNodeTimelineCollectorsAuxService 
determine if the container is AM or not.

 Change to use the AM flag in ContainerContext determine AM container
 

 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen

 After YARN-3116, we will have a flag in ContainerContext to determine if the 
 container is AM or not in aux service. We need to change accordingly to make 
 use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.


[ 
https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646622#comment-14646622
 ] 

Rohith Sharma K S commented on YARN-3543:
-

I have one doubt that whether it is able to render on timeline web UI. I 
remember that these changes I did for timeline web UI fetching the data. Anyway 
I will verify it tomorrow and confirm does it required. 

 ApplicationReport should be able to tell whether the Application is AM 
 managed or not. 
 ---

 Key: YARN-3543
 URL: https://issues.apache.org/jira/browse/YARN-3543
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.6.0
Reporter: Spandan Dutta
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 
 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 
 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 
 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG


 Currently we can know whether the application submitted by the user is AM 
 managed from the applicationSubmissionContext. This can be only done  at the 
 time when the user submits the job. We should have access to this info from 
 the ApplicationReport as well so that we can check whether an app is AM 
 managed or not anytime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend


[ 
https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646893#comment-14646893
 ] 

Li Lu commented on YARN-3049:
-

Hi [~zjshen], some of my comments:

- The addition on {{newApp}} is to indicate if we need if we need to update the 
app2flow index table. This change is an interface change and it's slightly more 
than I thought. However, I still incline to proceed the changes in this JIRA so 
that we can speed up consolidating our POC patches. 

- FileSystemTimelineReaderImpl, in {{fillFields}}, maybe we can use 
EnumSet.allOf() to generate the universe of fields so that we can reuse the 
logic of the following for loop for Field.ALL? 

- Reader interface: use TimelineCollectorContext to package reader arguments?

- HBaseTimelineReaderImpl:
l.160 (all line numbers are after patch)
{code}
byte[] row = result.getRow();
{code}
unused? 

l.213 name of private method {{getEntity}}: I think we may want to distinguish 
that with the external {{getEntity}} API. How about parseEntity or 
getEntitiFromResult? 

We're now performing filters by ourselves in memory. I'm wondering if it will 
be more efficient to translate some of our filter specifications into HBase 
filters? 

l.113, 136, 142: I'm a little bit worry about the {{0L}}s. Shall we have 
something like DEFAULT_TIME to make the argument list more readable? 

I assume the problem raised in l.369 (if the event come with no info, it will 
be missed) will be addressed after YARN-3984? 

- HBaseTimelineWriterImpl:
l.121-122: The log information is unclear about the write happened onto the 
App2Flow table? Also, we may want to keep this message in debug level?

- TimelineSchemaCreator:
Why we are not adding {{a2f}} as an option, similar to what we did in l.94-102 
for {{e}} and {{m}}?

- App2FlowColumn:
l.51, {{private}} appears to be redundant in enums. Similarly in l.42 or 
App2FlowColumnFamily. 

nits: 
- Name of App2FlowTable, AppToFlowTable? Saving one character every time is not 
quite helpful...

- l. 248, 263, 336: I'm confused by the name readConnections...

- Add a specific test in TestHBaseTimelineWriterImpl for App2FlowTable? 

 [Storage Implementation] Implement storage reader interface to fetch raw data 
 from HBase backend
 

 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, 
 YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch


 Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3974) Refactor the reservation system test cases to use parameterized base test


[ 
https://issues.apache.org/jira/browse/YARN-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646895#comment-14646895
 ] 

Hadoop QA commented on YARN-3974:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |  10m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  16m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 42s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 13s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m 13s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 46s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 55s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 112m 41s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747854/YARN-3974-v4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c020b62 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8710/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8710/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8710/console |


This message was automatically generated.

 Refactor the reservation system test cases to use parameterized base test
 -

 Key: YARN-3974
 URL: https://issues.apache.org/jira/browse/YARN-3974
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: YARN-3974-v1.patch, YARN-3974-v2.patch, 
 YARN-3974-v3.patch, YARN-3974-v4.patch


 We have two test suites for testing ReservationSystem against Capacity  Fair 
 scheduler. We should combine them using a parametrized reservation system 
 base test similar to YARN-2797



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

[
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646924#comment-14646924
]

Zhijie Shen commented on YARN-3984:
---

[~vrushalic], thanks for picking it up. The aforementioned cases are definitely
good to support, while the current query we want to support now (in YARN-3051
and YARN-3049) is to retrieve all events belonging to an entity (e.g.
application, attempt, container and etc.). With this basic query, we can easily
distill the details that happen to the entity, such as the diagnostic msg of
the kill event. In this case, the most efficient way is to put timestamp even
before the event ID, so that we don't need to order the events in memory.

In addition to the key composition, I find another significant problem with the
event store schema. If the event doesn't contain any info, it will be ignored
then. And we cannot always guarantee user will put something into info. For
example, user may define a KILL event without any diagnostic msg.

Rethink event column key issue
--

Key: YARN-3984
URL: https://issues.apache.org/jira/browse/YARN-3984
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
Fix For: YARN-2928

Currently, the event column key is event_id?info_key?timestamp, which is not
so friendly to fetching all the events of an entity and sorting them in a
chronologic order. IMHO, timestamp?event_id?info_key may be a better key
schema. I open this jira to continue the discussion about it which was
commented on YARN-3908.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue

2015-07-29 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646930#comment-14646930
 ] 

Sangjin Lee commented on YARN-3984:
---

{quote}
In addition to the key composition, I find another significant problem with the 
event store schema. If the event doesn't contain any info, it will be ignored 
then. And we cannot always guarantee user will put something into info. For 
example, user may define a KILL event without any diagnostic msg.
{quote}

Thanks for spotting that issue [~zjshen]. That's definitely a huge issue. We 
should address that as part of this JIRA...

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.005.patch

Refreshed my patch according to [~sjlee0]'s comments. Specifically, I set up a
new interface (OfflineAggregationWriter) for aggregation writers. With this new
interface I decoupled PhoenixOfflineAggregationWriter from TimelineWriter.
Having a separate offline writer interface also gives us more freedom to design
the aggregation storage interface. Now in the new writer API the type of the
offline aggregation is specified by the incoming {{OfflineAggregationInfo}}.

I also considered to combine reader and writer interfaces into a
OfflineAggregationStorage interface, but it turned out that we may have some
reader-only implementations (such as reading app level aggregations from
HBase). Separating offline readers and writers will give us more freedom in
this case.

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

Key: YARN-3904
URL: https://issues.apache.org/jira/browse/YARN-3904
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
Attachments: YARN-3904-YARN-2928.001.patch,
YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch,
YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch

After we finished the design for time-based aggregation, we can adopt our
existing Phoenix storage into the storage of the aggregated data. In this
JIRA, I'm proposing to refactor writers to add support to aggregation
writers. Offline aggregation writers typically has less contextual
information. We can distinguish these writers by special naming. We can also
use CollectorContexts to model all contextual information and use it in our
writer interfaces.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader


[ 
https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647023#comment-14647023
 ] 

Hadoop QA commented on YARN-3814:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747894/YARN-3814.reference.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ddc867ce |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8712/console |


This message was automatically generated.

 REST API implementation for getting raw entities in TimelineReader
 --

 Key: YARN-3814
 URL: https://issues.apache.org/jira/browse/YARN-3814
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3814-YARN-2928.01.patch, 
 YARN-3814-YARN-2928.02.patch, YARN-3814.reference.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3994) RM should respect AM resource/placement constraints

2015-07-29 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3994:

Assignee: (was: Varun Vasudev)

 RM should respect AM resource/placement constraints
 ---

 Key: YARN-3994
 URL: https://issues.apache.org/jira/browse/YARN-3994
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha

 Today, locality and cpu for the AM can be specified in the AM launch 
 container request but are ignored at the RM. Locality is assumed to be ANY 
 and cpu is dropped. There may be other things too that are ignored. This 
 should be fixed so that the user gets what is specified in their code to 
 launch the AM. cc [~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3984) Rethink event column key issue


[ 
https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646935#comment-14646935
 ] 

Zhijie Shen commented on YARN-3984:
---

In fact, metric has the same problem, but it may be still okay to ignore a 
metric without any data.

 Rethink event column key issue
 --

 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928


 Currently, the event column key is event_id?info_key?timestamp, which is not 
 so friendly to fetching all the events of an entity and sorting them in a 
 chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
 schema. I open this jira to continue the discussion about it which was 
 commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated


[ 
https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646975#comment-14646975
 ] 

Wangda Tan commented on YARN-3945:
--

And forgot to mention, maxApplicationsPerUser computation is a byproduct of 
user-limit, I would like to see if we can reach some consent about 
change/not-change user-limit before fixing maxApplicationPerUser based on 
existing user-limit assumptions.

 maxApplicationsPerUser is wrongly calculated
 

 Key: YARN-3945
 URL: https://issues.apache.org/jira/browse/YARN-3945
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch


 maxApplicationsPerUser is currently calculated based on the formula
 {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * 
 userLimitFactor)}} but description of userlimit is 
 {quote}
 Each queue enforces a limit on the percentage of resources allocated to a 
 user at any given time, if there is demand for resources. The user limit can 
 vary between a minimum and maximum value.{color:red} The the former (the 
 minimum value) is set to this property value {color} and the latter (the 
 maximum value) depends on the number of users who have submitted 
 applications. For e.g., suppose the value of this property is 25. If two 
 users have submitted applications to a queue, no single user can use more 
 than 50% of the queue resources. If a third user submits an application, no 
 single user can use more than 33% of the queue resources. With 4 or more 
 users, no user can use more than 25% of the queues resources. A value of 100 
 implies no user limits are imposed. The default is 100. Value is specified as 
 a integer.
 {quote}
 configuration related to minimum limit should not be made used in a formula 
 to calculate max applications for a user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM


[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647017#comment-14647017
 ] 

Naganarasimha G R commented on YARN-3995:
-

Two approaches were discussed till now :
#  we can have timer task which periodically cleans up collector after some 
period and not imm remove it when AM container is finished.
# When RM finishes the attempt then it can send one finish event through 
timelineclient for the ApplicationEntity which is kind of a marker based on 
which NM's TimelineCollectorManager can act upon.



 Some of the NM events are not getting published due race condition when AM 
 container finishes in NM 
 

 Key: YARN-3995
 URL: https://issues.apache.org/jira/browse/YARN-3995
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Affects Versions: YARN-2928
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R

 As discussed in YARN-3045:  While testing in TestDistributedShell found out 
 that few of the container metrics events were failing as there will be race 
 condition. When the AM container finishes and removes the collector for the 
 app, still there is possibility that all the events published for the app by 
 the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated


[ 
https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646964#comment-14646964
 ] 

Wangda Tan commented on YARN-3945:
--

Thanks for summarizing [~Naganarasimha]. I think we *might* need to reconsider 
user-limit / user-limit-factor configuration. I can also see it's hard to be 
understood:
- User-limit is not a lower bound nor higher bound.
- User-limit is not a fairness mechanism to balance resources between users, 
instead, it can lead to bad imbalance. One example is, if we set user-limit = 
50, and there're 10 users running, we cannot manage how much resource can be 
used by each user.
- It's really hard to understand, I spent time working on CapacityScheduler 
almost everyday, but sometimes I will forget and need to look at code to see 
how it is computed. :-(.
Basically User-limit is computed by:
{{user-limit = {{min(queue-capacity * user-limit-factor, current-capacity * 
max(user-limit / 100, 1 / #active-user)}}. But this formula is not that 
meaningful since #active-user is changing every minute, it is not a predictable 
formula.

Instead we may need to consider some notion like fair sharing: 
user-limit-factor becomes max-resource-limit of each user, and 
user-limit-percentage becomes something like guaranteed-concurrent-#user, when 
#user  guaranteed-concurrent-#user, rest users can only get idle shares.

With this approach, and considering we have user-limit-preemption within a 
queue (YARN-2113), we can get a predictable user-limit.

Thoughts? [~nroberts], [~jlowe].

 maxApplicationsPerUser is wrongly calculated
 

 Key: YARN-3945
 URL: https://issues.apache.org/jira/browse/YARN-3945
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch


 maxApplicationsPerUser is currently calculated based on the formula
 {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * 
 userLimitFactor)}} but description of userlimit is 
 {quote}
 Each queue enforces a limit on the percentage of resources allocated to a 
 user at any given time, if there is demand for resources. The user limit can 
 vary between a minimum and maximum value.{color:red} The the former (the 
 minimum value) is set to this property value {color} and the latter (the 
 maximum value) depends on the number of users who have submitted 
 applications. For e.g., suppose the value of this property is 25. If two 
 users have submitted applications to a queue, no single user can use more 
 than 50% of the queue resources. If a third user submits an application, no 
 single user can use more than 33% of the queue resources. With 4 or more 
 users, no user can use more than 25% of the queues resources. A value of 100 
 implies no user limits are imposed. The default is 100. Value is specified as 
 a integer.
 {quote}
 configuration related to minimum limit should not be made used in a formula 
 to calculate max applications for a user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

[
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646992#comment-14646992
]

Naganarasimha G R commented on YARN-3045:
-

Thanks for the comments [~djp],
bq. We already have a new flush() API now for writer that checked in
YARN-3949... You are right that we are lacking of API to respect this
priority/policy in the whole data flow for writing. I will file another JIRA to
track this.
I went through the discussions and the patch of YARN-3949, i feel calling two
apis would be not so user friendly and how will the users of TimelineClient
call flush ? i think its not captured in YARN-3949
bq. Anyway, I would support the scope (container events + foundation work) you
proposed here in case you are comfortable with.
I am fine with single jira, but only trouble is as and when the scope increases
there will be more delay in the jira as more discussions will be required(in
this case which entity to publish NM App localization events ) and also as
its been long since i am holding this jira so thought of getting the basic one
out and develop on top of it. I am ok if you want to avoid multiple jira's.

bq. That's a good question. My initative thinking is we could need something
like NodemanagerEntity to store application events, resource localizaiton
event, log aggregation handling events, configuration, etc. However, I would
like to hear you and other guys' ideas on this as well.
We had a discussion on this topic today in the meeting and [~sjlee0] was of the
opinion not to have another entity here. I think we need more discussions on
this as it involves querying too. Approach what i can think of is :
* For Applicationlevel events in NM can be under ApplicationEntity and EventID
can have Event Type
(INIT_APPLICATION/APPLICATION_FINISHED/APPLICATION_LOG_HANDLING_FAILED) and
NM_ID
* For Localization i feel it can be under ContainerEntity and the EventID can
have Event Type (REQUEST,LOCALIZED,LOCALIZATION_FAILED)and PATH of the
localized resource.

bq. IMO, the 2nd approach (hook to existing event dispatcher) looks simpler and
straightforward.
This approach is straight fwd but not sure it might have impact( just initial
apprehensions) but will start of implementing for container events and share
the initial patch based on this approach.

[Event producers] Implement NM writing container lifecycle events to ATS

Key: YARN-3045
URL: https://issues.apache.org/jira/browse/YARN-3045
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Attachments: YARN-3045-YARN-2928.002.patch,
YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch,
YARN-3045.20150420-1.patch

Per design in YARN-2928, implement NM writing container lifecycle events and
container system metrics to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

Anubhav Dhoot created YARN-3996:
---

 Summary: YARN-789 (Support for zero capabilities in fairscheduler) 
is broken after YARN-3305
 Key: YARN-3996
 URL: https://issues.apache.org/jira/browse/YARN-3996
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical


RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest with 
mininumResource for the incrementResource. This causes normalize to return zero 
if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-07-29 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647045#comment-14647045
 ] 

Arun Suresh commented on YARN-3920:
---

Thanks for the patch [~adhoot],

The patch looks pretty straight forward to me and the test case looks good.
My only minor comment is, Maybe we can expose this as an absolute value, rather 
than a ratio and the {{isReservable()}} function will just take 
min(ReservationThreshold, MaxCapability). I am ok either way though

+1 pending above decision

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: yARN-3920.001.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3974) Refactor the reservation system test cases to use parameterized base test


[ 
https://issues.apache.org/jira/browse/YARN-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647070#comment-14647070
 ] 

Anubhav Dhoot commented on YARN-3974:
-

LGTM

 Refactor the reservation system test cases to use parameterized base test
 -

 Key: YARN-3974
 URL: https://issues.apache.org/jira/browse/YARN-3974
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: YARN-3974-v1.patch, YARN-3974-v2.patch, 
 YARN-3974-v3.patch, YARN-3974-v4.patch


 We have two test suites for testing ReservationSystem against Capacity  Fair 
 scheduler. We should combine them using a parametrized reservation system 
 base test similar to YARN-2797



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647095#comment-14647095
 ] 

zhangyubiao commented on YARN-3979:
---

the cluster is about 1600。and about 550 apps running.  2  lakh  apps completed 
.   NodeManager in one times all lost and  recovery for a monment 。 I use 
Hadoop-2.2.0 in CentOS 6.5 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647132#comment-14647132
 ] 

Li Lu commented on YARN-3904:
-

The two failed tests passed on my local machine, and the failures appeared to 
be irrelevant. This said, we may still need to fix those intermittent test 
failures. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647130#comment-14647130
 ] 

zhangyubiao commented on YARN-3979:
---

I had send you an email of RM Jstack log 
and I wil send your app log soon 


 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-07-29 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647163#comment-14647163
]

Sangjin Lee commented on YARN-3045:
---

bq. I went through the discussions and the patch of YARN-3949, i feel calling
two apis would be not so user friendly and how will the users of TimelineClient
call flush ? i think its not captured in YARN-3949

We did discuss it in that JIRA. See [this
comment|https://issues.apache.org/jira/browse/YARN-3949?focusedCommentId=14640959page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14640959]
for instance. Note that the user of those two methods is really
{{TimelineCollector}}. I don't think we'd be exposing {{flush()}} to
{{TimelineClient}}. The synchronous nature of the writes would be expressed
differently on {{TimelineClient}}.

bq. We had a discussion on this topic today in the meeting and Sangjin Lee was
of the opinion not to have another entity here. I think we need more
discussions on this as it involves querying too.

To elaborate it a little further, creating a new entity type just to capture
different origins of application events seems bit too much. These are really
events that belong to YARN applications, and I don't see why they shouldn't be
part of the YARN application entities. It also simplifies the query model. When
you query for a YARN application entity, you get all application events,
regardless of whether they originate from RM or NMs. That's a much nicer
interaction for querying for a YARN app.

[Event producers] Implement NM writing container lifecycle events to ATS

Per design in YARN-2928, implement NM writing container lifecycle events and
container system metrics to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647075#comment-14647075
 ] 

Hadoop QA commented on YARN-2884:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m  7s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   8m 25s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |  11m 29s | The applied patch generated  4  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 45s | The applied patch generated  1 
new checkstyle issues (total was 237, now 237). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 49s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   8m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 10s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   6m 33s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  55m  5s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 121m 34s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747878/YARN-2884-V6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c020b62 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/diffJavadocWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8711/console |


This message was automatically generated.

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
 YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647177#comment-14647177
 ] 

Rohith Sharma K S commented on YARN-3979:
-

Thanks for the information!!
bq. NodeManager in one times all lost and recovery for a monment
I can think of the scenario very close to YARN-3990. Since you have 2 lakh apps 
completed and 1600 NodeManager, when the all the nodes lost and reconnected, 
the number of events that generated are {{(2lakh completed + 550 running = 
200550)*1600(number of NodeManager) = 32088}} events..Ooops!!!

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647074#comment-14647074
 ] 

Hadoop QA commented on YARN-3904:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 36s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 14s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 46s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   7m 58s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  44m 48s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl |
|   | 
hadoop.yarn.server.timelineservice.storage.TestPhoenixOfflineAggregationWriterImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747900/YARN-3904-YARN-2928.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / df0ec47 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8713/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8713/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8713/console |


This message was automatically generated.

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic

2015-07-29 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647150#comment-14647150
 ] 

Jian He commented on YARN-3983:
---

thanks Wangda ! some comments on the patch
 - ApplicationResourceAllocator - ContainerAllocator
 - NewContainerAllocator -  RegularContainerAllocator 
 - internalPreAllocation -  preAllocate
 - move the assingContainersOnNode into internalApplyAllocation
 - internalApplyAllocation - doAllocation
 - doAllocation - allocate
 - AllocatorAllocationResult - ContainerAllocation
 - SKIPPED_APP - SKIP_APP; similarly for others
 - this.resourceToBeAllocated can be  set null; the caller can check whether 
null or not
{code}
if (resourceToBeAllocated == null) {
  this.resourceToBeAllocated = Resources.none();
} else {
  this.resourceToBeAllocated = resourceToBeAllocated;
}
{code}
- AllocatorAllocationResult#allocateNodeType - 
AllocatorAllocationResult#containerNodeType
- Fix FiCaSchedulerApp#assignContainer method format and remove the unused 
createdContainer parameter
- handleNewContainerReservation does not need be a separate method;
- getCSAssignmentFromAllocateResult can be part of doAllocation.

 Make CapacityScheduler to easier extend application allocation logic
 

 Key: YARN-3983
 URL: https://issues.apache.org/jira/browse/YARN-3983
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3983.1.patch


 While working on YARN-1651 (resource allocation for increasing container), I 
 found it is very hard to extend existing CapacityScheduler resource 
 allocation logic to support different types of resource allocation.
 For example, there's a lot of differences between increasing a container and 
 allocating a container:
 - Increasing a container doesn't need to check locality delay.
 - Increasing a container doesn't need to build/modify a resource request tree 
 (ANY-RACK/HOST).
 - Increasing a container doesn't need to check allocation/reservation 
 starvation (see {{shouldAllocOrReserveNewContainer}}).
 - After increasing a container is approved by scheduler, it need to update an 
 existing container token instead of creating new container.
 And there're lots of similarities when allocating different types of 
 resources.
 - User-limit/queue-limit will be enforced for both of them.
 - Both of them needs resource reservation logic. (Maybe continuous 
 reservation looking is needed for both of them).
 The purpose of this JIRA is to make easier extending CapacityScheduler 
 resource allocation logic to support different types of resource allocation, 
 make common code reusable, and also better code organization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected

Rohith Sharma K S created YARN-3990:
---

 Summary: AsyncDispatcher may overloaded with RMAppNodeUpdateEvent 
when Node is connected 
 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Priority: Critical


Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
to all the applications that are in the rmcontext. But for 
finished/killed/failed applications it is not required to send these events. 
Additional check for wheather app is finished/killed/failed would minimizes the 
unnecessary events

{code}
  public void handle(NodesListManagerEvent event) {
RMNode eventNode = event.getNode();
switch (event.getType()) {
case NODE_UNUSABLE:
  LOG.debug(eventNode +  reported unusable);
  unusableRMNodesConcurrentSet.add(eventNode);
  for(RMApp app: rmContext.getRMApps().values()) {
this.rmContext
.getDispatcher()
.getEventHandler()
.handle(
new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
RMAppNodeUpdateType.NODE_UNUSABLE));
  }
  break;
case NODE_USABLE:
  if (unusableRMNodesConcurrentSet.contains(eventNode)) {
LOG.debug(eventNode +  reported usable);
unusableRMNodesConcurrentSet.remove(eventNode);
  }
  for (RMApp app : rmContext.getRMApps().values()) {
this.rmContext
.getDispatcher()
.getEventHandler()
.handle(
new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
RMAppNodeUpdateType.NODE_USABLE));
  }
  break;
default:
  LOG.error(Ignoring invalid eventtype  + event.getType());
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645795#comment-14645795
 ] 

Bibin A Chundatt commented on YARN-3990:


[~rohithsharma] 
Yes,Currently i have submitted about 50K+ apps and 
{{yarn.resourcemanager.max-completed-applications}} is set to 20K. 
{{yarn.resourcemanager.state-store.max-completed-applications}} default 
={{yarn.resourcemanager.max-completed-applications}}

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3990.patch


 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645601#comment-14645601
 ] 

zhangyubiao commented on YARN-3979:
---

Thank you for reply  @Rohith Sharma K S 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


 [ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangyubiao updated YARN-3979:
--
Attachment: ERROR103.log

the RM Log is  so large  So I grep ERROR for logs 

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao
 Attachments: ERROR103.log


 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


 [ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3990:

Summary: AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node 
is connected/disconnected  (was: AsyncDispatcher may overloaded with 
RMAppNodeUpdateEvent when Node is connected )

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical

 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected


 [ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-3990:
--

Assignee: Bibin A Chundatt

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected 
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical

 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


 [ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3990:
---
Attachment: 0001-YARN-3990.patch

[~rohithsharma] Attaching patch for initial review

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3990.patch


 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM


[ 
https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645535#comment-14645535
 ] 

Rohith Sharma K S commented on YARN-3979:
-

How many applications completed? How many applications are running? How many NM 
are running? When is this event queeu is full? Any observation  you made?

 Am in ResourceLocalizationService hang 10 min cause RM kill  AM
 ---

 Key: YARN-3979
 URL: https://issues.apache.org/jira/browse/YARN-3979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: CentOS 6.5  Hadoop-2.2.0
Reporter: zhangyubiao

 2015-07-27 02:46:17,348 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1437735375558
 _104282_01_01
 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE)
 2015-07-27 02:56:18,510 INFO 
 SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
  Authorization successful for appattempt_1437735375558_104282_0
 1 (auth:TOKEN) for protocol=interface 
 org.apache.hadoop.yarn.api.ContainerManagementProtocolPB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645567#comment-14645567
 ] 

Bibin A Chundatt commented on YARN-3990:


[~rohithsharma]

{code}
2015-07-29 19:39:03,409 | INFO  | ResourceManager Event Processor | Added node 
host-7:26009 clusterResource: memory:178400, vCores:64 | 
CapacityScheduler.java:1358
2015-07-29 19:39:03,409 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 3000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,409 | DEBUG | Socket Reader #1 for port 26003 |  got #2125 
| Server.java:1790
2015-07-29 19:39:03,409 | DEBUG | IPC Server handler 7 on 26003 | IPC Server 
handler 7 on 26003: 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
172.168.100.7:24999 Call#2125 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER | 
Server.java:2058
2015-07-29 19:39:03,410 | DEBUG | IPC Server handler 7 on 26003 | 
PrivilegedAction as:mapred/hadoop.hadoop@hadoop.com (auth:KERBEROS) 
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082) | 
UserGroupInformation.java:1696
2015-07-29 19:39:03,410 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 4000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,410 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 5000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,411 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 6000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,412 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 7000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,412 | INFO  | IPC Server handler 7 on 26003 | Size of 
event-queue is 7000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,412 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 8000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,413 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 9000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,414 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 1 | AsyncDispatcher.java:235
2015-07-29 19:39:03,414 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 11000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,415 | DEBUG | IPC Server handler 7 on 26003 | Served: 
nodeHeartbeat queueTime= 1 procesingTime= 5 | ProtobufRpcEngine.java:631
2015-07-29 19:39:03,415 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 12000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,416 | DEBUG | IPC Server handler 7 on 26003 | Adding 
saslServer wrapped token of size 100 as call response. | Server.java:2460
2015-07-29 19:39:03,416 | DEBUG | IPC Server handler 7 on 26003 | IPC Server 
handler 7 on 26003: responding to 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
172.168.100.7:24999 Call#2125 Retry#0 | Server.java:994
2015-07-29 19:39:03,416 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 13000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,416 | DEBUG | IPC Server handler 7 on 26003 | IPC Server 
handler 7 on 26003: responding to 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
172.168.100.7:24999 Call#2125 Retry#0 Wrote 118 bytes. | Server.java:1013
2015-07-29 19:39:03,416 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 14000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,417 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 15000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,418 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 16000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,419 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 17000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,419 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 18000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,420 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 19000 | AsyncDispatcher.java:235
2015-07-29 19:39:03,421 | INFO  | AsyncDispatcher event handler | Size of 
event-queue is 2 | AsyncDispatcher.java:235
2015-07-29 19:39:03,421 | DEBUG | AsyncDispatcher event handler | Dispatching 
the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppNodeUpdateEvent.EventType:
 NODE_UPDATE | AsyncDispatcher.java:166
2015-07-29 19:39:03,421 | DEBUG | AsyncDispatcher event handler | Processing 
event for application_1438101193238_224125 of type NODE_UPDATE | 
RMAppImpl.java:741
2015-07-29 19:39:03,421 | DEBUG | AsyncDispatcher event handler | Dispatching 
the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppNodeUpdateEvent.EventType:
 NODE_UPDATE | AsyncDispatcher.java:166
2015-07-29 19:39:03,421 | DEBUG | AsyncDispatcher event handler | Processing 
event for application_1438101193238_224126 of type NODE_UPDATE | 
RMAppImpl.java:741
2015-07-29 19:39:03,422 | DEBUG | AsyncDispatcher event handler |

[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime


[ 
https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645566#comment-14645566
 ] 

Rohith Sharma K S commented on YARN-3887:
-

Your understanding is correct. I was meant to say to have new synchronous API 
like {{updateApplicationStateSynchronizly}} in RMStateStore.
[~jianhe] what do you think having new synchronous api in RMstatstore?

 Support for changing Application priority during runtime
 

 Key: YARN-3887
 URL: https://issues.apache.org/jira/browse/YARN-3887
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch


 After YARN-2003, adding support to change priority of an application after 
 submission. This ticket will handle the server side implementation for same.
 A new RMAppEvent will be created to handle this, and will be common for all 
 schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645583#comment-14645583
 ] 

Rohith Sharma K S commented on YARN-3990:
-

thanks [~bibinchundatt] for reproducing the issue. I believe in you clustesr 
appsCompleted/appsRunning are 2 and max number of completed apps to keep is 
set to 20k? 

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical

 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3945) maxApplicationsPerUser is wrongly calculated


 [ 
https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3945:

Attachment: YARN-3945.20150729-1.patch

Oops My Mistake, corrected the patch to remove javac warnings...

 maxApplicationsPerUser is wrongly calculated
 

 Key: YARN-3945
 URL: https://issues.apache.org/jira/browse/YARN-3945
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch


 maxApplicationsPerUser is currently calculated based on the formula
 {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * 
 userLimitFactor)}} but description of userlimit is 
 {quote}
 Each queue enforces a limit on the percentage of resources allocated to a 
 user at any given time, if there is demand for resources. The user limit can 
 vary between a minimum and maximum value.{color:red} The the former (the 
 minimum value) is set to this property value {color} and the latter (the 
 maximum value) depends on the number of users who have submitted 
 applications. For e.g., suppose the value of this property is 25. If two 
 users have submitted applications to a queue, no single user can use more 
 than 50% of the queue resources. If a third user submits an application, no 
 single user can use more than 33% of the queue resources. With 4 or more 
 users, no user can use more than 25% of the queues resources. A value of 100 
 implies no user limits are imposed. The default is 100. Value is specified as 
 a integer.
 {quote}
 configuration related to minimum limit should not be made used in a formula 
 to calculate max applications for a user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3989) Show messages only for NodeLabel commmands in RMAdminCLI


[ 
https://issues.apache.org/jira/browse/YARN-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645870#comment-14645870
 ] 

Naganarasimha G R commented on YARN-3989:
-

hi [~bibinchundatt]  [~sunilg],
Is the purpose of this jira to remove the stack trace and show only the 
exception message ? If so i think multiple places for multiple commands we need 
to be handle and just for RMAdminCLI and NodeLabel commands
And also sometimes it might become difficult for developers to look into the 
issue and resolve it if the stacktrace is removed, i think we should have some 
flexible way as in case of operations too verbose will not be intended but in 
case of development it will be helpfull. Thoughts ?

 Show messages only for NodeLabel commmands in RMAdminCLI
 

 Key: YARN-3989
 URL: https://issues.apache.org/jira/browse/YARN-3989
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor

 Currently for nodelabel command execution failure  full exception stacktrace 
 is shown. This jira is to handle exceptions and show only messages.
 As per the discussion in YARN-3963
 [~sunilg]
 {quote}
 As I see it, I can see full exception stack trace in client side in this case 
 (also in case of other commands too) and its too verbose. I think we can make 
 its compact and n it will be more easily readable.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated


[ 
https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645959#comment-14645959
 ] 

Hadoop QA commented on YARN-3945:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  1 
new checkstyle issues (total was 92, now 91). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m 13s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m  5s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747768/YARN-3945.20150729-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6374ee0 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8704/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8704/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8704/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8704/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8704/console |


This message was automatically generated.

 maxApplicationsPerUser is wrongly calculated
 

 Key: YARN-3945
 URL: https://issues.apache.org/jira/browse/YARN-3945
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch


 maxApplicationsPerUser is currently calculated based on the formula
 {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * 
 userLimitFactor)}} but description of userlimit is 
 {quote}
 Each queue enforces a limit on the percentage of resources allocated to a 
 user at any given time, if there is demand for resources. The user limit can 
 vary between a minimum and maximum value.{color:red} The the former (the 
 minimum value) is set to this property value {color} and the latter (the 
 maximum value) depends on the number of users who have submitted 
 applications. For e.g., suppose the value of this property is 25. If two 
 users have submitted applications to a queue, no single user can use more 
 than 50% of the queue resources. If a third user submits an application, no 
 single user can use more than 33% of the queue resources. With 4 or more 
 users, no user can use more than 25% of the queues resources. A value of 100 
 implies no user limits are imposed. The default is 100. Value is specified as 
 a integer.
 {quote}
 configuration related to minimum limit should not be made used in a formula 
 to calculate max applications for a user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3990) AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is connected/disconnected


[ 
https://issues.apache.org/jira/browse/YARN-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645981#comment-14645981
 ] 

Hadoop QA commented on YARN-3990:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  5s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 18s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m  1s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 55s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747731/0001-YARN-3990.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6374ee0 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8705/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8705/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8705/console |


This message was automatically generated.

 AsyncDispatcher may overloaded with RMAppNodeUpdateEvent when Node is 
 connected/disconnected
 

 Key: YARN-3990
 URL: https://issues.apache.org/jira/browse/YARN-3990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3990.patch


 Whenever node is added or removed, NodeListManager sends RMAppNodeUpdateEvent 
 to all the applications that are in the rmcontext. But for 
 finished/killed/failed applications it is not required to send these events. 
 Additional check for wheather app is finished/killed/failed would minimizes 
 the unnecessary events
 {code}
   public void handle(NodesListManagerEvent event) {
 RMNode eventNode = event.getNode();
 switch (event.getType()) {
 case NODE_UNUSABLE:
   LOG.debug(eventNode +  reported unusable);
   unusableRMNodesConcurrentSet.add(eventNode);
   for(RMApp app: rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_UNUSABLE));
   }
   break;
 case NODE_USABLE:
   if (unusableRMNodesConcurrentSet.contains(eventNode)) {
 LOG.debug(eventNode +  reported usable);
 unusableRMNodesConcurrentSet.remove(eventNode);
   }
   for (RMApp app : rmContext.getRMApps().values()) {
 this.rmContext
 .getDispatcher()
 .getEventHandler()
 .handle(
 new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
 RMAppNodeUpdateType.NODE_USABLE));
   }
   break;
 default:
   LOG.error(Ignoring invalid eventtype  + event.getType());
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated