tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster 
manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590447561
 
 
   > General question about priority, I did not find much here [1].
   > How is the value of priority interpreted ?
   > Is it simply to "tag" requests ?
   > Or are higher priority requests 'prioritized' over lower priority requests 
from an application (to a queue) ?
   > 
   > How does it compare with [2] ? Will that be cleaner (using tags) ?
   > 
   > [1] 
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/Priority.html
   > 
   > [2] 
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/SchedulingRequest.html
   
   I don't think the Priority is documented very well at all. We ran into this 
issue with TEZ, where you can't have different container sizes within the same 
Priority.  A priority is as it sounds, higher priorities get allocated first. 
For Spark I don't think this matters since we finish a stage before proceeding 
to the next. If we had a slow start feature like MapReduce then it would be.  
It does mean that if you have 2 stages with different resourceProfile running 
at the same time, one of those stages containers would be prioritized over the 
other, but again I don't think that is an issue. If you can think of a case it 
would be let me know.  There is actually a way to get around using different 
priorities but you have to turn on a feature in YARN to use like tags. Since 
that is optional feature I didn't want to rely on it and I didn't see any 
issues with the Priority.
   
   I haven't looked at the SchedulingRequest in detail but its more about 
placement and gang scheduling - 
https://issues.apache.org/jira/browse/YARN-6592. That is definitely something 
interesting but would prefer to do it separate from this, unless you see an 
issue with the Priority? I can look at it more to see if it would get around 
having to use Priority, but the schedulingRequest itself also has a priority, 
though has a separate resource sizing. I would almost bet it has the same 
restriction, but maybe its using the tags  to get around this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to