date:20150814


[ 
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697063#comment-14697063
 ] 

Hudson commented on YARN-3987:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed 
by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 am container complete msg ack to NM once RM receive it
 --

 Key: YARN-3987
 URL: https://issues.apache.org/jira/browse/YARN-3987
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee
Assignee: sandflee
 Fix For: 2.8.0

 Attachments: YARN-3987.001.patch, YARN-3987.002.patch


 In our cluster we set max-am-attempts to a very very large num, and 
 unfortunately our am crash after launched, leaving too many completed 
 container(AM container) in NM.  completed container is removed from NM and 
 NMStateStore only if container complete is passed to AM, but if AM couldn't 
 be launched, the completed AM container couldn't be cleaned, and may eat up  
 NM heap memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697043#comment-14697043
 ] 

Hudson commented on YARN-4047:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
YARN-4047. ClientRMService getApplications has high scheduler lock contention. 
Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.2

 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697038#comment-14697038
 ] 

Hudson commented on YARN-4005:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
YARN-4005. Completed container whose app is finished is possibly not removed 
from NMStateStore. Contributed by Jun Gong (jianhe: rev 
38aed1a94ed7b6da62e2445b5610bc02b1cddeeb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


 Completed container whose app is finished is not removed from NMStateStore
 --

 Key: YARN-4005
 URL: https://issues.apache.org/jira/browse/YARN-4005
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: YARN-4005.01.patch


 If a container is completed and its corresponding app is finished, NM only 
 removes it from its context and does not add it to 
 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
 not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it


[ 
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697040#comment-14697040
 ] 

Hudson commented on YARN-3987:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed 
by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 am container complete msg ack to NM once RM receive it
 --

 Key: YARN-3987
 URL: https://issues.apache.org/jira/browse/YARN-3987
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee
Assignee: sandflee
 Fix For: 2.8.0

 Attachments: YARN-3987.001.patch, YARN-3987.002.patch


 In our cluster we set max-am-attempts to a very very large num, and 
 unfortunately our am crash after launched, leaving too many completed 
 container(AM container) in NM.  completed container is removed from NM and 
 NMStateStore only if container complete is passed to AM, but if AM couldn't 
 be launched, the completed AM container couldn't be cleaned, and may eat up  
 NM heap memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697061#comment-14697061
 ] 

Hudson commented on YARN-4005:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
YARN-4005. Completed container whose app is finished is possibly not removed 
from NMStateStore. Contributed by Jun Gong (jianhe: rev 
38aed1a94ed7b6da62e2445b5610bc02b1cddeeb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


 Completed container whose app is finished is not removed from NMStateStore
 --

 Key: YARN-4005
 URL: https://issues.apache.org/jira/browse/YARN-4005
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: YARN-4005.01.patch


 If a container is completed and its corresponding app is finished, NM only 
 removes it from its context and does not add it to 
 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
 not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697066#comment-14697066
 ] 

Hudson commented on YARN-4047:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
YARN-4047. ClientRMService getApplications has high scheduler lock contention. 
Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.2

 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697118#comment-14697118
 ] 

Hudson commented on YARN-4005:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
YARN-4005. Completed container whose app is finished is possibly not removed 
from NMStateStore. Contributed by Jun Gong (jianhe: rev 
38aed1a94ed7b6da62e2445b5610bc02b1cddeeb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 Completed container whose app is finished is not removed from NMStateStore
 --

 Key: YARN-4005
 URL: https://issues.apache.org/jira/browse/YARN-4005
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: YARN-4005.01.patch


 If a container is completed and its corresponding app is finished, NM only 
 removes it from its context and does not add it to 
 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
 not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697123#comment-14697123
 ] 

Hudson commented on YARN-4047:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
YARN-4047. ClientRMService getApplications has high scheduler lock contention. 
Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* hadoop-yarn-project/CHANGES.txt


 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.2

 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it


[ 
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697120#comment-14697120
 ] 

Hudson commented on YARN-3987:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed 
by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 am container complete msg ack to NM once RM receive it
 --

 Key: YARN-3987
 URL: https://issues.apache.org/jira/browse/YARN-3987
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee
Assignee: sandflee
 Fix For: 2.8.0

 Attachments: YARN-3987.001.patch, YARN-3987.002.patch


 In our cluster we set max-am-attempts to a very very large num, and 
 unfortunately our am crash after launched, leaving too many completed 
 container(AM container) in NM.  completed container is removed from NM and 
 NMStateStore only if container complete is passed to AM, but if AM couldn't 
 be launched, the completed AM container couldn't be cleaned, and may eat up  
 NM heap memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it


[ 
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697142#comment-14697142
 ] 

Hudson commented on YARN-3987:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed 
by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


 am container complete msg ack to NM once RM receive it
 --

 Key: YARN-3987
 URL: https://issues.apache.org/jira/browse/YARN-3987
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee
Assignee: sandflee
 Fix For: 2.8.0

 Attachments: YARN-3987.001.patch, YARN-3987.002.patch


 In our cluster we set max-am-attempts to a very very large num, and 
 unfortunately our am crash after launched, leaving too many completed 
 container(AM container) in NM.  completed container is removed from NM and 
 NMStateStore only if container complete is passed to AM, but if AM couldn't 
 be launched, the completed AM container couldn't be cleaned, and may eat up  
 NM heap memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697140#comment-14697140
 ] 

Hudson commented on YARN-4005:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
YARN-4005. Completed container whose app is finished is possibly not removed 
from NMStateStore. Contributed by Jun Gong (jianhe: rev 
38aed1a94ed7b6da62e2445b5610bc02b1cddeeb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt


 Completed container whose app is finished is not removed from NMStateStore
 --

 Key: YARN-4005
 URL: https://issues.apache.org/jira/browse/YARN-4005
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: YARN-4005.01.patch


 If a container is completed and its corresponding app is finished, NM only 
 removes it from its context and does not add it to 
 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
 not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697145#comment-14697145
 ] 

Hudson commented on YARN-4047:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
YARN-4047. ClientRMService getApplications has high scheduler lock contention. 
Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* hadoop-yarn-project/CHANGES.txt


 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.2

 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-14 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697314#comment-14697314
]

Ming Ma commented on YARN-221:
--

The unit test failures aren't related. The tests pass on the local machine.

Another thing Xuan and I discussed is how other frameworks on YARN such as MR,
Tez can use this feature; for example if they need to make config and/or code
change to allow framework applications specify the policy at per application
basis. There are several approaches.

* Have MR define its own configurations to config these policies. Make code
change at YarnRunner to retrieve these configurations and set the values at
ASC. That means Tez needs to do the same thing.
* Define some common YARN configurations such as
yarn.logaggregation.policy.class. YarnRunner still needs to retrieve these
configurations and set the values at ASC. But at least MR and Tez can share the
same configuration names.
* Define some common YARN configurations such as
yarn.logaggregation.policy.class. YarnClientImpl take care of fixing up ASC
based on the configurations. In that way, no code change is required at the MR
or Tez layer.

Eventually, we prefer to go with the first approach, which is used by other
existing MR properties. If we want to define some common YARN properties used
by different YARN applications, we can have a separate jira for it.

NM should provide a way for AM to tell it not to aggregate logs.

Key: YARN-221
URL: https://issues.apache.org/jira/browse/YARN-221
Project: Hadoop YARN
Issue Type: Sub-task
Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch,
YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch,
YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch

The NodeManager should provide a way for an AM to tell it that either the
logs should not be aggregated, that they should be aggregated with a high
priority, or that they should be aggregated but with a lower priority. The
AM should be able to do this in the ContainerLaunch context to provide a
default value, but should also be able to update the value when the container
is released.
This would allow for the NM to not aggregate logs in some cases, and avoid
connection to the NN at all.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-08-14 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697369#comment-14697369
 ] 

Karthik Kambatla commented on YARN-1680:


Any updates here? We would like to get one of this or YARN-3446 in soon. 

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith Sharma K S
Assignee: Tan, Wangda
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697449#comment-14697449
 ] 

Varun Saxena commented on YARN-4053:


Also for floating point metrics, query can be in integral form. This can create 
issues too. We should clearly document that query should also be in decimal 
representation for such metrics. That is, checking for condition like m1  40 
should be mean query from client should have filter as {{m1  40.0}} in REST 
API. So that its interpreted as a floating point number

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper is used to convert 
 and store values in backend HBase storage. This converts everything into a 
 string representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat


 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4024:
-
Issue Type: Improvement  (was: Bug)

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Hong Zhiguo

 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code


[ 
https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697531#comment-14697531
 ] 

Vrushali C commented on YARN-4025:
--

Hmm, yes I think some more comments there might help (I should have included 
them in the earlier patch) 

 Deal with byte representations of Longs in writer code
 --

 Key: YARN-4025
 URL: https://issues.apache.org/jira/browse/YARN-4025
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Vrushali C
Assignee: Sangjin Lee
 Attachments: YARN-4025-YARN-2928.001.patch, 
 YARN-4025-YARN-2928.002.patch


 Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl 
 code. There seem to be some places in the code where there are conversions 
 between Long to byte[] to String for easier argument passing between function 
 calls. Then these values end up being converted back to byte[] while storing 
 in hbase. 
 It would be better to pass around byte[] or the Longs themselves  as 
 applicable. 
 This may result in some api changes (store function) as well in adding a few 
 more function calls like getColumnQualifier which accepts a pre-encoded byte 
 array. It will be in addition to the existing api which accepts a String and 
 the ColumnHelper to return a byte[] column name instead of a String one. 
 Filing jira to track these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697549#comment-14697549
]

Vrushali C commented on YARN-3904:
--

A couple of more things that came to mind. We need not change the patch for
just these, but wanted to say what's on my mind.

- Do we want to provide a dropTable api ? I think we should not. In production
situation, this can be a costly mistake if someone is testing their code on the
cluster. A drop table should be a very manual command so that one is aware that
they are running it.

- Are the '?' and ',' special characters in this line? Is so, we dont have to
change this right now, but maybe next time this code is being looked at, could
we make it into a constant
{code}
String sql = UPSERT INTO + info.getTableName()
+ ( + StringUtils.join(info.getPrimaryKeyList(), ,)
+ , created_time, modified_time, metric_names)
+ VALUES (
+ StringUtils.repeat(?,, info.getPrimaryKeyList().length)
+ ?, ?, ?);
{code}

The patch looks good overall. thanks [~gtCarrera9]

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

Key: YARN-3904
URL: https://issues.apache.org/jira/browse/YARN-3904
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
Attachments: YARN-3904-YARN-2928.001.patch,
YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch,
YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch,
YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch,
YARN-3904-YARN-2928.008.patch

After we finished the design for time-based aggregation, we can adopt our
existing Phoenix storage into the storage of the aggregated data. In this
JIRA, I'm proposing to refactor writers to add support to aggregation
writers. Offline aggregation writers typically has less contextual
information. We can distinguish these writers by special naming. We can also
use CollectorContexts to model all contextual information and use it in our
writer interfaces.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

[
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697556#comment-14697556
]

Varun Saxena commented on YARN-4053:

bq. What kind of metrics do you have in mind that will have floating point
numbers ?
There was some plan for reporting some cluster level metrics in future too, few
of them would be floating point as well. Refer to json in YARN-3881
Also I remember some discussion during aggregation design regarding storing
averages. Are we planning to calculate them on the fly instead ?
Moreover,TimelineMetric stores metric value as a {{java.lang.Number}}. This
means we are saying metric can store a floating point value as well. As we have
no control over systems outside YARN(say Tez), if they use ATS and publish a
metric of floating type, I guess we should be able to handle it.

Thoughts ?

If it has been decided that metrics can only be integral values, then its fine.
Wont have to take care of it then. Let me know.
Also, another key point we need to decide is that do we only support values
till signed longs(8 bytes) ?

Change the way metric values are stored in HBase Storage

Key: YARN-4053
URL: https://issues.apache.org/jira/browse/YARN-4053
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

Currently HBase implementation uses GenericObjectMapper to convert and store
values in backend HBase storage. This converts everything into a string
representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for
metrics.
So we need to decide how are we going to encode and decode metric values and
store them in HBase.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code


[ 
https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697534#comment-14697534
 ] 

Vrushali C commented on YARN-4025:
--

I changed it from '?' to '='.  Sangjin was also wondering if we should change 
it or no (read earlier comments in the jira). I think it might be good to 
change it now since '?' is a wild card character and using a non wild card 
character helps in easier reading while testing and for the reader code as 
well.  It's a very small change, so thought another jira for this was an 
overkill. 

 Deal with byte representations of Longs in writer code
 --

 Key: YARN-4025
 URL: https://issues.apache.org/jira/browse/YARN-4025
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Vrushali C
Assignee: Sangjin Lee
 Attachments: YARN-4025-YARN-2928.001.patch, 
 YARN-4025-YARN-2928.002.patch


 Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl 
 code. There seem to be some places in the code where there are conversions 
 between Long to byte[] to String for easier argument passing between function 
 calls. Then these values end up being converted back to byte[] while storing 
 in hbase. 
 It would be better to pass around byte[] or the Longs themselves  as 
 applicable. 
 This may result in some api changes (store function) as well in adding a few 
 more function calls like getColumnQualifier which accepts a pre-encoded byte 
 array. It will be in addition to the existing api which accepts a String and 
 the ColumnHelper to return a byte[] column name instead of a String one. 
 Filing jira to track these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697571#comment-14697571
 ] 

Varun Saxena commented on YARN-4053:


Tez may not be publishing any floating point metric as of now. I am not too 
sure about what all they publish. So probably there is no use case as of now. 
But if we do not support floating point numbers, then we should clearly 
document that we will only support integral values. And do the conversion in 
writer if any floating point value comes.

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper to convert and store 
 values in backend HBase storage. This converts everything into a string 
 representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697537#comment-14697537
 ] 

Vrushali C commented on YARN-3904:
--

A very minor comment.. I think there is a typo in 
PHEONIX_OFFLINE_STORAGE_CONN_STR_DEFAULT variable name in 
YarnConfiguration.java 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code


[ 
https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697545#comment-14697545
 ] 

Li Lu commented on YARN-4025:
-

Oh sorry I missed that line... That looks fine. 

 Deal with byte representations of Longs in writer code
 --

 Key: YARN-4025
 URL: https://issues.apache.org/jira/browse/YARN-4025
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Vrushali C
Assignee: Sangjin Lee
 Attachments: YARN-4025-YARN-2928.001.patch, 
 YARN-4025-YARN-2928.002.patch


 Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl 
 code. There seem to be some places in the code where there are conversions 
 between Long to byte[] to String for easier argument passing between function 
 calls. Then these values end up being converted back to byte[] while storing 
 in hbase. 
 It would be better to pass around byte[] or the Longs themselves  as 
 applicable. 
 This may result in some api changes (store function) as well in adding a few 
 more function calls like getColumnQualifier which accepts a pre-encoded byte 
 array. It will be in addition to the existing api which accepts a String and 
 the ColumnHelper to return a byte[] column name instead of a String one. 
 Filing jira to track these changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

[
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697654#comment-14697654
]

Varun Saxena commented on YARN-3862:

[~gtCarrera9], these 2 JIRAs' were raised separately to address following areas
:
# Enhance already supported filters (YARN-3863) to filter out rows of data. By
adding support for OR in addition to AND and relational ops for metrics. Scope
for this JIRA is pretty clear.
# Restrict the amount of data retrieved(from columns) in this JIRA. In this
JIRA, we actually wanted to have a discussion on what all we need to support.
Regex, prefix match , etc. Also whether we want to retrieve metrics by time
windows as well.

I am open to realigning these JIRAs' and distributing the work along the lines
of the workflow you mentioned above.
My only concern with deciding with a filter object model though will be that we
may take a lot of time deciding it to cover all the scenarios. Because support
for additional filters may come up during further discussion.
Let's do as per whatever is the consensus.

Decide which contents to retrieve and send back in response in TimelineReader
-

Key: YARN-3862
URL: https://issues.apache.org/jira/browse/YARN-3862
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
Attachments: YARN-3862-YARN-2928.wip.01.patch

Currently, we will retrieve all the contents of the field if that field is
specified in the query API. In case of configs and metrics, this can become a
lot of data even though the user doesn't need it. So we need to provide a way
to query only a set of configs or metrics.
As a comma spearated list of configs/metrics to be returned will be quite
cumbersome to specify, we have to support either of the following options :
# Prefix match
# Regex
# Group the configs/metrics and query that group.
We also need a facility to specify a metric time window to return metrics in
a that window. This may be useful in plotting graphs

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

[
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697583#comment-14697583
]

Vrushali C commented on YARN-4053:
--

Hmm good points.

I think all metrics should be stored of the same type else we have to deal with
knowing which metric is of which type and would need to store metadata to know
how to read it back. Storing it as an ascii value is not good, we need to be
able to query for things like less than greater than etc.

My vote is for going with Longs for all metrics right now unless there is a
very strong use case where only decimals will do. We truncate (cast down)
decimals to long if we receive any, so 99.9 means 99. I realize this is
restrictive but my thinking is that instead of trying to do everything for this
current ATS release, let's go with Longs and see if we really need decimal
precision. If we do, we can revisit and modify to accept more data types. cc
[~jrottinghuis]

Change the way metric values are stored in HBase Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

[
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697663#comment-14697663
]

Varun Saxena commented on YARN-3862:

bq. My feeling is that the concept of timeline filter may become a part of our
object model, so that client users can easily communicate?
Do we want to expose it to the client ? Suggestion sounds good. That wasnt the
plan but if everyone agrees, lets have it that way.

bq. are we treating our timeline filters as pure-data objects (models)
Yes I am as of now treating them as pure data objects. That is why instead of
using polymorphism and converting the filter to HBase Filter by providing a
method for conversion in the filter class(es), I kept the conversion in util
class. The intention was to decouple filters from storage implementation.

bq. is it easy, or possible, for us to implement a paging filter?
Will look into it.

Decide which contents to retrieve and send back in response in TimelineReader
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader


[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697667#comment-14697667
 ] 

Varun Saxena commented on YARN-3862:


BTW, did not upload a WIP patch for YARN-3863 due to issue raised in YARN-4053

 Decide which contents to retrieve and send back in response in TimelineReader
 -

 Key: YARN-3862
 URL: https://issues.apache.org/jira/browse/YARN-3862
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3862-YARN-2928.wip.01.patch


 Currently, we will retrieve all the contents of the field if that field is 
 specified in the query API. In case of configs and metrics, this can become a 
 lot of data even though the user doesn't need it. So we need to provide a way 
 to query only a set of configs or metrics.
 As a comma spearated list of configs/metrics to be returned will be quite 
 cumbersome to specify, we have to support either of the following options :
 # Prefix match
 # Regex
 # Group the configs/metrics and query that group.
 We also need a facility to specify a metric time window to return metrics in 
 a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697341#comment-14697341
 ] 

Wangda Tan commented on YARN-4024:
--

[~zhiguohong], the DNS cache is a global parameter for a JVM, correct? IMHO, we 
shouldn't use the global parameter, because RM may need to get latest IP 
address from DNS for other purpose. For example, RM needs to get latest address 
when NMs are registering (and also reconnect), but it may not need it when NMs 
is running.

Thoughts?

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Hong Zhiguo

 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4029) Update LogAggregationStatus to store on finish

2015-08-14 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4029:
---
Attachment: 0001-YARN-4029.patch

Attaching initial patch.
Please do review.

 Update LogAggregationStatus to store on finish
 --

 Key: YARN-4029
 URL: https://issues.apache.org/jira/browse/YARN-4029
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-4029.patch, Image.jpg


 Currently the log aggregation status is not getting updated to Store. When RM 
 is restarted will show NOT_START. 
 Steps to reproduce
 
 1.Submit mapreduce application
 2.Wait for completion
 3.Once application is completed switch RM
 *Log Aggregation Status* are changing
 *Log Aggregation Status* from SUCCESS to NOT_START



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4053) Change the way metric values are stored in HBase Storage

[
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-4053:
---
Description:
Currently HBase implementation uses GenericObjectMapper to convert and store
values in backend HBase storage. This converts everything into a string
representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for
metrics.

So we need to decide how are we going to encode and decode metric values and
store them in HBase.

was:
Currently HBase implementation uses GenericObjectMapper is used to convert and
store values in backend HBase storage. This converts everything into a string
representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for
metrics.

So we need to decide how are we going to encode and decode metric values and
store them in HBase.

Change the way metric values are stored in HBase Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697443#comment-14697443
 ] 

Varun Saxena commented on YARN-4053:


Storing metric values(which are numbers) as string is fine if we want to check 
them for equality.
But we have to support all relational operations for metrics. And that is where 
string representation doesnt work.
This is because in HBase, filters currently use lexicographical comparison. 
This means that with current mechanism to store metric values, a value of 4000 
will be judged as smaller than 60.


 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper is used to convert 
 and store values in backend HBase storage. This converts everything into a 
 string representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4053) Change the way metric values are stored in HBase Storage

Varun Saxena created YARN-4053:
--

 Summary: Change the way metric values are stored in HBase Storage
 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena


Currently HBase implementation uses GenericObjectMapper is used to convert and 
store values in backend HBase storage. This converts everything into a string 
representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for 
metrics. 

So we need to decide how are we going to encode and decode metric values and 
store them in HBase.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697500#comment-14697500
 ] 

Vrushali C commented on YARN-4053:
--

I think metric values should be stored (and read back) as Longs.

What kind of metrics do you have in mind that will have floating point numbers? 
Any percentages that we want to store? I don't think we really need that level 
of precision.

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper to convert and store 
 values in backend HBase storage. This converts everything into a string 
 representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

[
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697445#comment-14697445
]

Varun Saxena commented on YARN-4053:

So to resolve this we need some other way of storing metric values. Options are
as under :
# Keep the current way of storing metric values. And write a custom filter to
match the values. But this would need the new filter to be deployed on all
region servers. This solution hence may not be feasible. But if we do not want
to do this, for lexicographic comparison to work, sizes of bytes compared
should be equal.
# Store values as primitive types. That is, long as 8 bytes, integer as 4 bytes
and so on. But this can create problems in lexicographical comparison too. Say
metric m1 is stored as long. But a query to reader might be of the form {{m1
4}}. As 4 will be interpreted as Integer, we will try to compare 4
bytes against 8 bytes.
So the solution for this is to store every integral value as long(8 bytes) and
floating point values as double. Same approach can be used while matching at
reader side.
# But above solution may not work if we want to support BigInteger and
BigDecimal values(i.e. numerical values 8 bytes). Although 8 bytes should be
enough but aggregated values may exceed 8 bytes. In this case, we can probably
decide values upto how many bytes do we need to support. 16 bytes, for that
matter even 12 bytes should be more than enough for all realistic scenarios.
While encoding we can do padding with zeroes in front if number is less than 16
bytes.
# Another option can be to continue supporting string representation and
restrict max number of digits we want to support before and after decimal
point. Say 30 digits before decimal point and 3 after. We can pad rest of the
bytes with zeroes while storing so that comparison can be done.

Change the way metric values are stored in HBase Storage

Currently HBase implementation uses GenericObjectMapper is used to convert
and store values in backend HBase storage. This converts everything into a
string representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for
metrics.
So we need to decide how are we going to encode and decode metric values and
store them in HBase.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

[
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697411#comment-14697411
]

Wangda Tan commented on YARN-1680:
--

[~kasha], sorry I don't have a chance to take this, unassigning myself.

I suggest we can finish MAPREDUCE-6302 (I think approach of MAPREDUCE-6302
looks good to me) to resolve such deadlock issues. AvailableResource
calculation can be improved after that.

Thoughts?

availableResources sent to applicationMaster in heartbeat should exclude
blacklistedNodes free memory.
--

Key: YARN-1680
URL: https://issues.apache.org/jira/browse/YARN-1680
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
Environment: SuSE 11 SP2 + Hadoop-2.3
Reporter: Rohith Sharma K S
Assignee: Tan, Wangda
Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch,
YARN-1680-v2.patch, YARN-1680.patch

There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster
slow start is set to 1.
Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is
become unstable(3 Map got killed), MRAppMaster blacklisted unstable
NodeManager(NM-4). All reducer task are running in cluster now.
MRAppMaster does not preempt the reducers because for Reducer preemption
calculation, headRoom is considering blacklisted nodes memory. This makes
jobs to hang forever(ResourceManager does not assing any new containers on
blacklisted nodes but returns availableResouce considers cluster free
memory).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.


 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1680:
-
Assignee: (was: Tan, Wangda)

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith Sharma K S
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697454#comment-14697454
 ] 

Varun Saxena commented on YARN-4053:


cc [~sjlee0], [~djp], [~zjshen], [~vinodkv].
Thoughts ?
Will implement one of the options above depending on whatever is the consensus.

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper is used to convert 
 and store values in backend HBase storage. This converts everything into a 
 string representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697816#comment-14697816
 ] 

Vrushali C commented on YARN-3904:
--

+1 yes we can move ahead.

I am quite curious, how is the accessibility being restricted? The method has 
no specifier so that means it is package level visible, no? Also, the 
annotations of @private and @VisibleForTesting are only annotations, they don't 
really affect the private/public accessibility of the function. Or am I 
mistaken?

But that said, let's go ahead with the patch, my question is only for 
discussion purpose. 


 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

[
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697840#comment-14697840
]

Li Lu commented on YARN-3862:
-

bq. That is why instead of using polymorphism and converting the filter to
HBase Filter by providing a method for conversion in the filter class(es), I
kept the conversion in util class. The intention was to decouple filters from
storage implementation.
I agree with this approach. Meanwhile we may also want to restrict the range of
the util class. Instead of making them in TimelineReaderUtils, feel free to add
something like HBaseFilterConverter to model the filter conversion logic.

Decide which contents to retrieve and send back in response in TimelineReader
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697967#comment-14697967
 ] 

Naganarasimha G R commented on YARN-4053:
-

[~vrushalic] how about double ?
I feel it would be the better as it too takes the same size of long (8 bytes) 
and supports decimals too ?

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper to convert and store 
 values in backend HBase storage. This converts everything into a string 
 representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697968#comment-14697968
 ] 

Naganarasimha G R commented on YARN-4053:
-

[~vrushalic] how about double ?
I feel it would be the better as it too takes the same size of long (8 bytes) 
and supports decimals too ?

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper to convert and store 
 values in backend HBase storage. This converts everything into a string 
 representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers


 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.009.patch

Fixed the typo raised by [~vrushalic]. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697761#comment-14697761
 ] 

Jian He commented on YARN-4014:
---

some comments on my side:
- updateApplicationPriority has two RPC calls, one to get the appReport the 
other to update priority. Can we make it one call ? we can make 
updateApplicationPriority throw an ApplicationNotRunningException and let 
client catch the exception and prints “Application not running “ msg.
- I missed two things in YARN-3887, would you mind fixing those here ?
  -- the updateApplicationStateSynchronously should not send the 
APP_UPDATE_SAVED events and so RMAppImpl should not need handle this event as 
changed this patch. 
  -- CapacityScheduler#updateApplicationPriority should not be synchronized. 
it’ll cause problem if hold capacity scheduler lock while accessing 
state-store. 

 Support user cli interface in for Application Priority
 --

 Key: YARN-4014
 URL: https://issues.apache.org/jira/browse/YARN-4014
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch


 Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
 changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader


[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697845#comment-14697845
 ] 

Li Lu commented on YARN-3862:
-

Oh and, BTW, I thinks it's pretty much fine on the code side, so please feel 
free to proceed this JIRA as planed. Thanks! 

 Decide which contents to retrieve and send back in response in TimelineReader
 -

 Key: YARN-3862
 URL: https://issues.apache.org/jira/browse/YARN-3862
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3862-YARN-2928.wip.01.patch


 Currently, we will retrieve all the contents of the field if that field is 
 specified in the query API. In case of configs and metrics, this can become a 
 lot of data even though the user doesn't need it. So we need to provide a way 
 to query only a set of configs or metrics.
 As a comma spearated list of configs/metrics to be returned will be quite 
 cumbersome to specify, we have to support either of the following options :
 # Prefix match
 # Regex
 # Group the configs/metrics and query that group.
 We also need a facility to specify a metric time window to return metrics in 
 a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows

2015-08-14 Thread Inigo Goiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-7.patch

Merging to trunk with the newest resource monitoring structure

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Assignee: Inigo Goiri
Priority: Minor
  Labels: BB2015-05-TBR, containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
 YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows

2015-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697970#comment-14697970
 ] 

Hadoop QA commented on YARN-3458:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 19s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 26s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   1m 56s | Tests failed in 
hadoop-yarn-common. |
| | |  39m  3s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.util.TestRackResolver |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750623/YARN-3458-7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / dc7a061 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8847/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8847/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8847/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8847/console |


This message was automatically generated.

 CPU resource monitoring in Windows
 --

 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Assignee: Inigo Goiri
Priority: Minor
  Labels: BB2015-05-TBR, containers, metrics, windows
 Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
 YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The current implementation of getCpuUsagePercent() for 
 WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
 do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
 This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage


[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1469#comment-1469
 ] 

Li Lu commented on YARN-4053:
-

Hi [~varun_saxena], I agree this is a valid issue. Before we get deep 
involvement into this issue, I'm wondering if this is blocking any of our 
ongoing tasks to finish our planned POC of the reader and web UI? 

 Change the way metric values are stored in HBase Storage
 

 Key: YARN-4053
 URL: https://issues.apache.org/jira/browse/YARN-4053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena

 Currently HBase implementation uses GenericObjectMapper to convert and store 
 values in backend HBase storage. This converts everything into a string 
 representation(ASCII/UTF-8 encoded byte array).
 While this is fine in most cases, it does not quite serve our use case for 
 metrics. 
 So we need to decide how are we going to encode and decode metric values and 
 store them in HBase.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697818#comment-14697818
 ] 

Hadoop QA commented on YARN-3904:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m  7s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 11s | The applied patch generated  1 
new checkstyle issues (total was 214, now 214). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 20s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 25s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  43m  0s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750594/YARN-3904-YARN-2928.009.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / f40c735 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8846/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8846/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8846/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8846/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8846/console |


This message was automatically generated.

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697758#comment-14697758
]

Li Lu commented on YARN-3904:
-

Thanks [~vrushalic]! I agree we should not make a public dropTable api.
Actually in my code I'm restricting the accessibility of this method to test
only.

About the special characters, the comma and question marks are used for
prepared SQL statements in JDBC, which should be quite stable by now. But I
agree that we should clean up the sql statements when we touch this part in
future.

For now, if it's fine with all of us, maybe we can put this in and move forward
with the offline aggregation implementations? Thanks!

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3986) getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead

2015-08-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697801#comment-14697801
 ] 

Jian He commented on YARN-3986:
---

the proposal makes sense to me, thanks !

 getTransferredContainers in AbstractYarnScheduler should be present in 
 YarnScheduler interface instead
 --

 Key: YARN-3986
 URL: https://issues.apache.org/jira/browse/YARN-3986
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3986.01.patch, YARN-3986.02.patch


 Currently getTransferredContainers is present in {{AbstractYarnScheduler}}.
 *But in ApplicationMasterService, while registering AM, we are calling this 
 method by typecasting it to AbstractYarnScheduler, which is incorrect.*
 This method should be moved to YarnScheduler.
 Because if a custom scheduler is to be added, it will implement 
 YarnScheduler, not AbstractYarnScheduler.
 As ApplicationMasterService is calling getTransferredContainers by 
 typecasting it to AbstractYarnScheduler, it is imposing an indirect 
 dependency on AbstractYarnScheduler for any pluggable custom scheduler.
 We can move the method to YarnScheduler and leave the definition in 
 AbstractYarnScheduler as it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697952#comment-14697952
 ] 

Naganarasimha G R commented on YARN-3045:
-

[~djp]  [~sjlee0], 
Seems like patch seems to be failing on the new YARN-2928 branch...  will 
rebase and upload new one.

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697824#comment-14697824
]

Li Lu commented on YARN-3904:
-

Oh right now the test is using this utility method so it has to be default.
We're adding the annotations to avoid adding it to any public javadocs or API
lists. This is also an agreement among the reviewers. I agree it's not quite
enough, and I'm considering moving this dangerous part to test component in the
offline aggregator JIRA.

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

[
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697829#comment-14697829
]

Li Lu commented on YARN-3862:
-

I'm not worrying about having a filter object model will slow down everything.
Sure, we may not cover everything in the first draft, or even in the first
JIRA. However, if we know we're on the right track we're making progress. If we
realize any use case limitations we can always fix them later, but at this
early stage let's first have the right framework and get our planned goals
done.

Decide which contents to retrieve and send back in response in TimelineReader
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698126#comment-14698126
 ] 

Hadoop QA commented on YARN-3045:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 21s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:red}-1{color} | javac |   7m 58s | The applied patch generated  3  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 49s | The patch appears to introduce 4 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   9m 20s | Tests passed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   6m  9s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   1m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  57m 28s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750640/YARN-3045-YARN-2928.010.patch
 |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | YARN-2928 / f40c735 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/diffJavacWarnings.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8848/console |


This message was automatically generated.

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, 
 YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS


 [ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3045:

Attachment: YARN-3045-YARN-2928.010.patch

rebased the patch

 [Event producers] Implement NM writing container lifecycle events to ATS
 

 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3045-YARN-2928.002.patch, 
 YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, 
 YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, 
 YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, 
 YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, 
 YARN-3045.20150420-1.patch


 Per design in YARN-2928, implement NM writing container lifecycle events and 
 container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it


[ 
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696825#comment-14696825
 ] 

Hudson commented on YARN-3987:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/])
YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed 
by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


 am container complete msg ack to NM once RM receive it
 --

 Key: YARN-3987
 URL: https://issues.apache.org/jira/browse/YARN-3987
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee
Assignee: sandflee
 Fix For: 2.8.0

 Attachments: YARN-3987.001.patch, YARN-3987.002.patch


 In our cluster we set max-am-attempts to a very very large num, and 
 unfortunately our am crash after launched, leaving too many completed 
 container(AM container) in NM.  completed container is removed from NM and 
 NMStateStore only if container complete is passed to AM, but if AM couldn't 
 be launched, the completed AM container couldn't be cleaned, and may eat up  
 NM heap memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696823#comment-14696823
 ] 

Hudson commented on YARN-4005:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/])
YARN-4005. Completed container whose app is finished is possibly not removed 
from NMStateStore. Contributed by Jun Gong (jianhe: rev 
38aed1a94ed7b6da62e2445b5610bc02b1cddeeb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 Completed container whose app is finished is not removed from NMStateStore
 --

 Key: YARN-4005
 URL: https://issues.apache.org/jira/browse/YARN-4005
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: YARN-4005.01.patch


 If a container is completed and its corresponding app is finished, NM only 
 removes it from its context and does not add it to 
 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will 
 not remove it from NMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696828#comment-14696828
 ] 

Hudson commented on YARN-4047:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/])
YARN-4047. ClientRMService getApplications has high scheduler lock contention. 
Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.2

 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention


[ 
https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696842#comment-14696842
 ] 

Hudson commented on YARN-4047:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/])
YARN-4047. ClientRMService getApplications has high scheduler lock contention. 
Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java


 ClientRMService getApplications has high scheduler lock contention
 --

 Key: YARN-4047
 URL: https://issues.apache.org/jira/browse/YARN-4047
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: 2.6.1-candidate
 Fix For: 2.7.2

 Attachments: YARN-4047.001.patch


 The getApplications call can be particuarly expensive because the code can 
 call checkAccess on every application being tracked by the RM.  checkAccess 
 will often call scheduler.checkAccess which will grab the big scheduler lock. 
  This can cause a lot of contention with the scheduler thread which is busy 
 trying to process node heartbeats, app allocation requests, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it


[ 
https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696839#comment-14696839
 ] 

Hudson commented on YARN-3987:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/])
YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed 
by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt


 am container complete msg ack to NM once RM receive it
 --

 Key: YARN-3987
 URL: https://issues.apache.org/jira/browse/YARN-3987
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: sandflee
Assignee: sandflee
 Fix For: 2.8.0

 Attachments: YARN-3987.001.patch, YARN-3987.002.patch


 In our cluster we set max-am-attempts to a very very large num, and 
 unfortunately our am crash after launched, leaving too many completed 
 container(AM container) in NM.  completed container is removed from NM and 
 NMStateStore only if container complete is passed to AM, but if AM couldn't 
 be launched, the completed AM container couldn't be cleaned, and may eat up  
 NM heap memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore