[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-04-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391872#comment-14391872
 ] 

Vinod Kumar Vavilapalli commented on YARN-2261:
---

MAPREDUCE-4099 originally facilitated this for MapReduce in a not so ideal way.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325548#comment-14325548
 ] 

Bikas Saha commented on YARN-2261:
--

Looks like AM preemption will not fail the AM and so the comments about AM 
preemption are probably not valid.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-17 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325547#comment-14325547
 ] 

Bikas Saha commented on YARN-2261:
--

Sounds reasonable. Though things like how does YARN expose this information to 
the user would eventually need to be thought about. Currently the RM page shows 
running applications. How will show cleanup in progress? Can the original AM be 
preempted due to lack of resources? In that case, how will we launch the clean 
up container. Though probably the same problem exists now because if an AM is 
preempted it would not be able to clean up. However, moving this responsibility 
from the user to YARN (as proposed in this jira) makes that a YARN problem to 
solve.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-17 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325472#comment-14325472
 ] 

Xuan Gong commented on YARN-2261:
-

[~ste...@apache.org] [~bikassaha] [~vinodkv] any further comments for the 
proposal ?

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323321#comment-14323321
 ] 

Xuan Gong commented on YARN-2261:
-

Thanks for the comments. Steve.

bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore 
max...I'd advocate less mempory, but if pmem limits are turned on that's 
dangerous.

bq. would there be any actual/best effort offerings of the interval between AM 
termination and clean up scheduling?

I thought about this. 
* request the resource for clean-up container separately after the application 
is finished/failed/killed. In this case, the clean-up container can has its own 
resource requirement. As vinod's comment,  Cleanup container may not get 
resources because cluster may have gotten busy after the final AM exit.
* request the resource for the clean-up container at the same time when we 
request resource for AM container. And we can reserve the resource for the 
clean-up container, after the final AM exists, we use this reserved resource to 
launch the clean-up container.  In this case, the clean-up container can has 
its own resource requirement. But this option is not ideal. Because AM does not 
know whether it is the final. Even the RM does not know whether the current 
attempt is the final or not. RM only knows whether the previous attempt is 
final when it decides whether need to launch the next attempt. So, we need to 
request the resource for clean-up container every-time when we request resource 
for AM container. If current AM container is not the final, we will waste the 
resource.
* reuse the AM container resource as I proposed. If we have the feature (resize 
the container resource) ready, we could definitely let clean-up container has 
its own resource requirement.

Those are all the options that I can think for clean-up container scheduling, 
and that is why I propose that we can just reuse the AM container resource.

bq. My token concern is related to long lived apps: what tokens will they get/?

Currently, we could just give all the latest tokens which the AM has. I 
understand that for LRS apps, this is not enough. But i think that AM has the 
similar issue for the token renew/token update issue, we could fix those 
together.

bq. How does this mix up with pre-emption?

This is a good point. The resource for clean-up container still belongs to the 
application's resource. I think that we could do:
* if the container is clean-up container, we can not pre-empt it
OR
* if the clean-up container is pre-empted, we can just simply stop the clean-up 
process without retry, and mark as clean-up failure.




 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321376#comment-14321376
 ] 

Steve Loughran commented on YARN-2261:
--

# Maybe the cleanup containers could have lower limits on allocation: 1 vcore 
max...I'd advocate less mempory, but if pmem limits are turned on that's 
dangerous. 
# My token concern is related to long lived apps: what tokens will they get/?
# How does this mix up with pre-emption?
# would there be any actual/best effort offerings of the interval between AM 
termination and clean up scheduling?

I can see the appeal of rerunning in the AM container; there's a special case 
of repeated AM failure where the cleanup code may still be needed. Perhaps some 
history could be passed in (env var = FS URL of history) so cleanup logic 
could be smarter.

I think I'd really need to design a cleanup routine for slider to know what 
we'd actually need to run. Probably some of
* YARN registry cleanup (unless YARN-2571 actually gets committed, that being a 
far simpler option)
* HDFS cleanup (potentially)


 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321149#comment-14321149
 ] 

Xuan Gong commented on YARN-2261:
-

First of all, I think the users should know what and how the clean-up container 
need to do, and it is their responsibility to provide the hints to RM about how 
the clean-up container can be launched. For YARN, based on the provided hints, 
we should provide enough resource to launch the container and monitor it.

For the clean-up container itself, it should at least have following properties:
* Should be optional. 
* Have configurable time-out. We do not want the container take a long period 
time to do the clean up.
* Can retry

The most challenge part of this problem is how and when RM get the resource to 
launch this clean-up container. The proposal is when am container is finished, 
instead of releasing this container immediately, we could re-use this resource 
and launch the clean-up container. 

Similar as AMContainer, RM will launch this clean-up container. In that case, 
latest token information could be provided. 

[~bikassaha] [~ste...@apache.org] suggestions ?

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-11-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193324#comment-14193324
 ] 

Steve Loughran commented on YARN-2261:
--

One problem with this proposal as is that it doesn't address AM failure, 
especially for the final will not be restarted operation. 

If an AM could specify the cleanup routine to execute on cleanup (same 
resources as for the normal AM), this would be possible. For Slider we'd just 
add a new CLI option to the same entry point, something like {{-D 
slider.am.cleanup=true}}

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-11 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058745#comment-14058745
 ] 

Robert Joseph Evans commented on YARN-2261:
---

+1 either approach seems fine to me.  Vinod's requires an opt in, which is nice 
from a backwards compatibility standpoint.  Also do we want to count the 
cleanup container as a running application?  We definitely need to count its 
resources against any queue it is a part of, but for a queue that is configured 
to run mostly large applications, it could have other applications back up 
behind the cleanup containers.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058979#comment-14058979
 ] 

Bikas Saha commented on YARN-2261:
--

The cleanup would be indistinguishable from an AM that is cleaning up post job 
completion (as it happens today). Specially, if we use the second approach (via 
AM cleanup mode) this would be virtually indistinguishable from what happens 
today.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-11 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059060#comment-14059060
 ] 

Robert Joseph Evans commented on YARN-2261:
---

Yes and that is not necessarily a good thing.  Especially if cleanup can take a 
relatively long period of time.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059094#comment-14059094
 ] 

Bikas Saha commented on YARN-2261:
--

Would that be an existing issue that needs to be tracked separately?

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054571#comment-14054571
 ] 

Vinod Kumar Vavilapalli commented on YARN-2261:
---

The proposal here is to have a YARN application-level cleanup container that 
runs only as the last thing for an application in the cluster.
 - In a way, we already have this today, as we let AM's hang around for a while 
(by default - 10mins) *after* unregister - this feature makes it explicit.
 - For those who have lived in this space around for a while, this is akin to 
MR job-cleanup.
 - This feature lets apps submit a separate container-launch-context for 
cleanup, one that is only run after the app is done for real.
 - Clearly, it will
-- be optional.
-- Have timeouts on how much time it can take to finish (default, 
overridable, and upper limit. Default = today's time for AMs to exit after 
unregister?)
-- Have resource requests limits like usual
-- May have its own retries (Cleanup failure != Application failure as 
today?)

Some challenges
 - Cleanup container may not get resources because cluster may have gotten busy 
after the final AM exit. Solution is to reserve (part of) resources used by the 
last AM for use by the cleanup container

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2014-07-08 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055231#comment-14055231
 ] 

Bikas Saha commented on YARN-2261:
--

+1 for having the control/responsibility in YARN
An alternative that may fit better with the RM model of launching AM's is to 
optionally, have the RM run the AM in cleanup mode. This way the clean up logic 
can reside in the AM as it does today and the RM does not need to learn any new 
tricks about launching anything other that the AM. The existing launch context 
is used to launch the AM and the AM is told (via env or via register) that its 
in cleanup mode. The AM can use its logic to do cleanup and then successfully 
unregister with the RM. Until the unregister happens the RM can keep restarting 
the AM in clean up mode for a max of N times (to handled unexpected failures). 
When an AM is running in cleanup mode the it will not be allowed to make any 
allocated requests. This can be handled via AMRMClient so that AM's that use 
AMRMClient dont need to do anything. The ApplicationMasterService, of course, 
will need to handle this for non AMRMClient AM's.
By this method, minimal changes will be needed in the API and RM internals to 
enable this feature in a compatible manner.

 YARN should have a way to run post-application cleanup
 --

 Key: YARN-2261
 URL: https://issues.apache.org/jira/browse/YARN-2261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 See MAPREDUCE-5956 for context. Specific options are at 
 https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.



--
This message was sent by Atlassian JIRA
(v6.2#6252)