[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475058#comment-16475058 ]
Haibo Chen commented on YARN-8248: ---------------------------------- {quote}as {{RMAppManager.validateAndCreateResourceRequest()}} can return a null value for the AM requests, {quote} Good catch! It does indeed return null if the AM is unmanaged. But I am not sure how the debug message helps diagnose this issue. I'd prefer we remove the debug message {quote} Is this explanation makes it cleaner? {quote} Yes. That makes sense. Comments would be very help in this case. We could also maybe reverse the order of the two conditions. The current diagnostic message seems good to me now that I understand what the condition means. {quote} So in my understanding, it can happen that in {{addApplication()}} the app was not rejected, for example AM does not request vCores and we have 0 vCores configure as max resources, but for a map container, 1 vCores is requested. {quote} Indeed, that can happen to custom resource types. In FairScheduler.allocate(), instead of rejecting an application if any request is rejected, we can just filtering out the ones that should be rejected by removing them from the ask list (with warning log) and proceed. Rejecting an application after it has starting running (FairScheduler.allocate() is called remotely by AM) seems counter-intuitive. I think we can signal AM by throwing a SchedulerInvalidResoureRequestException, which is propagated to AM. What do you think? {quote}About the uncovered unit test: Good point and I was thinking about that if we can reject an application only if the AM request is greater than 0 and we have 0 configured as max resource or simply in any case where the requested resource is greater than max resource, regardless if it is 0 or not. {quote} Never mind comment 4). That's based on my previous misunderstanding. If AM request is large than than the non-zero max-resource (steady fair share), we should not reject, because the queue may get instantaneous fair share that is large enough. That's not related to this patch. Let me know if something does not make sense. > Job hangs when a queue is specified and the maxResources of the queue cannot > satisfy the AM resource request > ------------------------------------------------------------------------------------------------------------ > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth > Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > <queue name="sample_queue"> > <minResources>10000 mb,0vcores</minResources> > <maxResources>90000 mb,0vcores</maxResources> > <maxRunningApps>50</maxRunningApps> > <maxAMShare>-1.0f</maxAMShare> > <weight>2.0</weight> > <schedulingPolicy>fair</schedulingPolicy> > </queue> > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: <memory:1536, vCores:1> exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org