Varun Vasudev commented on YARN-3248:

Thanks for the feedback [~ozawa], [~vinodkv]. 

The blacklist is an instance of HashSet, so it can throw 
ConcurrentModificationException when blacklist is modified in another thread. 
One alternative is to use Collections.newSetFromMap(new 
ConcurrentHashMap<Object,Boolean>()) instead of HashSet.

Good catch. Collections.newSetFromMap won't work because the blacklist itself 
is a set. I create a copy of the structure in the latest patch.

bq. If AbstractYarnScheduler#getApplicationAttempt() can be used, I think it's 
more straightforward and simple. What do you think?

Agreed. Changed the code.

bq. Could you add tests to TestRMWebServicesApps?

I'm not sure what tests to add. I'm not adding any new web services.

The blacklist information is per application-attempt, and scheduler will forget 
previous application-attempts today. I think this is a general behaviour with 
the way blacklisting is done today - each AM is expected to explicitly 
blacklist all the nodes it wants to blacklist even if the previous attempt 
already informed about some of them before. That is how all of resource 
requests work. Given the above, we should make it clear that blacklists are 
really for this app-attempt.

I was under this impression as well, but it the information is maintained on a 
per app basis in the AbstractYarnScheduler.
protected Map<ApplicationId, SchedulerApplication<T>> applications;

bq. W.r.t UI, showing the list of all the nodes is going to be a UI scalability 
problem - how about we move this list to the per-app page? That is the place 
where this is useful the most.

Agreed. Made the change.

bq. We should also add this information to the web-services.

You mean the app information web service?

> Display count of nodes blacklisted by apps in the web UI
> --------------------------------------------------------
>                 Key: YARN-3248
>                 URL: https://issues.apache.org/jira/browse/YARN-3248
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler, resourcemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: Screenshot.jpg, apache-yarn-3248.0.patch
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.

This message was sent by Atlassian JIRA

Reply via email to