[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338830#comment-14338830
 ] 

Karthik Kambatla commented on YARN-3231:
----------------------------------------

Thanks for reporting and working on this, [~l201514]. The approach looks 
generally good. Few comments (some nits):
# Rename {{updateRunnabilityonRefreshQueues}} to {{updateRunnabilityOnReload}}? 
And, add a javadoc for when it should be called and what it does.
# javadoc for the newly added private method and the significance of the new 
integer param.
# Call the above method from AllocationReloadListner#onReload after all the 
other queue configs are updated.
# The comment here no longer applies. Remove it? 
{code}
        // No more than one app per list will be able to be made runnable, so
        // we can stop looking after we've found that many
        if (noLongerPendingApps.size() >= maxRunnableApps) {
          break;
        }
{code}
# Indentation:
{code}
    updateAppsRunnability(appsNowMaybeRunnable,
                appsNowMaybeRunnable.size());
{code}
# Newly added tests:
## If it is not too much trouble, can we move them to a new test class 
(TestAppRunnability?) mostly because TestFairScheduler has so many tests 
already. 
## Is it possible to reuse the code between these tests? 
## Should we add tests for when the maxRunnableApps for a user or queue is 
decreased? If you think this might need additional work in the logic as well, I 
am open to filing a follow up JIRA and addressing it there. 


> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3231
>                 URL: https://issues.apache.org/jira/browse/YARN-3231
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Siqi Li
>            Assignee: Siqi Li
>            Priority: Critical
>         Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to