[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203522#comment-15203522
 ] 

Vinod Kumar Vavilapalli commented on YARN-4837:
-----------------------------------------------

bq. "DISKS_FAILED" shouldn't be skipped for the reason I mentioned in 
YARN-4576. Also, we cannot simply judge system innocent when hitting memory 
issues.
As [~vvasudev] pointed out [here on 
YARN-4576|https://issues.apache.org/jira/browse/YARN-4576?focusedCommentId=15202664&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15202664],
 the right solution is to have the RM detect bouncing nodes and then not to 
allocate new containers to bouncing nodes until they stabilize.

bq. Also, hide all AM scheduling info/preference from application doesn't make 
sense in long time: AM can ask for resources for its running containers in the 
beginning, but application cannot ask how to place its AM even today which is 
sad to me.
My earlier comment came out a little inaccurate when I said about "hiding 
AM-container-scheduling inside the RM". What I really meant is that any 
automatic scheduling decision coming out of system failures/events should be 
hidden from end-users - just like preemption-handling! We already have 
ResourceRequest as part of AM-launch-context. No reason why we cannot have more 
such things. However, this is different from RM automatically ruling out nodes 
as was done at YARN-2005 and related JIRAs.

bq. YARN-4685 is something fixable and much better than the age without 
blacklist (we do see AM keep launching on bad nodes repeatedly and get stuck in 
many cases). We just need to go ahead to fix YARN-4685.
YARN-4685 happened because of an inappropriate solution to a real problem - we 
should pause going down this route till we figure out the right solution.

> User facing aspects of 'AM blacklisting' feature need fixing
> ------------------------------------------------------------
>
>                 Key: YARN-4837
>                 URL: https://issues.apache.org/jira/browse/YARN-4837
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to