[jira] [Commented] (YARN-4750) App metrics may not be correct when an app is recovered

2016-03-01 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175105#comment-15175105
 ] 

Srikanth Sampath commented on YARN-4750:


Agree [~jianhe]that it would be expensive to do update periodically.  However, 
it will be useful to indicate that the metrics are compromised.  One option can 
be to set the value to a special value (say a negative number) so as to to 
indicate a compromised value one time.  Just carrying on silently, can be 
misleading.

> App metrics may not be correct when an app is recovered
> ---
>
> Key: YARN-4750
> URL: https://issues.apache.org/jira/browse/YARN-4750
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
> saved in the state store when there is an attempt state transition. Values 
> for running attempts will be in memory and will not be saved when there is an 
> RM restart/failover. For recovered app metrics value will be reset. In that 
> case, these values will be incomplete. 
> Was this intentional or have we not found a correct way to fix it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-09-01 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726634#comment-14726634
 ] 

Srikanth Sampath commented on YARN-2571:


Thanks [~steve_l].  I will ping you directly.

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3972) Work Preserving AM Restart for MapReduce

2015-08-25 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710927#comment-14710927
 ] 

Srikanth Sampath commented on YARN-3972:


Upon discussing with [~vvasudev] exploring using Yarn Service Registry for 
Containers to locate the MR AppMaster.

 Work Preserving AM Restart for MapReduce
 

 Key: YARN-3972
 URL: https://issues.apache.org/jira/browse/YARN-3972
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Srikanth Sampath
Assignee: Raju Bairishetti
 Attachments: WorkPreservingMRAppMaster.pdf


 Providing a framework for work preserving AM is achieved in 
 [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
 to take advantage of this for MapReduce(MR) applications.  There are some 
 challenges which have been described in the attached document and few options 
 discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-08-25 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710946#comment-14710946
 ] 

Srikanth Sampath commented on YARN-2571:


What's the status of this patch - [~ste...@apache.org]  I am considering using 
YARN registry for MR AppMaster in YARN-3972 and want to take some learnings 
from here.

 RM to support YARN registry 
 

 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
  Labels: BB2015-05-TBR
 Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
 YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
 YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch


 The RM needs to (optionally) integrate with the YARN registry:
 # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
 principals)
 # app-launch: create the user directory /users/$username with the relevant 
 permissions (CRD) for them to create subnodes.
 # attempt, container, app completion: remove service records with the 
 matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3972) Work Preserving AM Restart for MapReduce

2015-07-24 Thread Srikanth Sampath (JIRA)
Srikanth Sampath created YARN-3972:
--

 Summary: Work Preserving AM Restart for MapReduce
 Key: YARN-3972
 URL: https://issues.apache.org/jira/browse/YARN-3972
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Srikanth Sampath


Providing a framework for work preserving AM is achieved in 
[YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like to 
take advantage of this for MapReduce(MR) applications.  There are some 
challenges which have been described in the attached document and few options 
discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3972) Work Preserving AM Restart for MapReduce

2015-07-24 Thread Srikanth Sampath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sampath updated YARN-3972:
---
Attachment: WorkPreservingMRAppMaster.pdf

 Work Preserving AM Restart for MapReduce
 

 Key: YARN-3972
 URL: https://issues.apache.org/jira/browse/YARN-3972
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Srikanth Sampath
Assignee: Raju Bairishetti
 Attachments: WorkPreservingMRAppMaster.pdf


 Providing a framework for work preserving AM is achieved in 
 [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
 to take advantage of this for MapReduce(MR) applications.  There are some 
 challenges which have been described in the attached document and few options 
 discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)