cheersyang commented on a change in pull request #61:
URL:
https://github.com/apache/incubator-yunikorn-site/pull/61#discussion_r660195875
##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -206,6 +214,64 @@ Annotations:
Once the job is submitted to the scheduler, the job won’t be scheduled
immediately.
Instead, the scheduler will ensure it gets its minimal resources before
actually starting the driver/executors.
+## Gang scheduling Styles
+
+Initially when the app encountered gang issues due to placeholder pod
allocation(failed due to various reasons), we marked the application failed
without retrying it. This wasn’t a really user friendly experience, so it led
to a demand of making the gangs scheduling style configurable and make it
possible to succeed to schedule the app through a fallback mechanism.
+
+To solve this issue we defined two Gang scheduling styles: Soft and Hard.
Review comment:
I feel we can simplify this to shorter lines, such as "there are 2 gang
scheduling styles supported, Soft and Hard respectively. It can be configured
per app-level to define how the app will behave in case the gang scheduling
fails."
##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -206,6 +214,64 @@ Annotations:
Once the job is submitted to the scheduler, the job won’t be scheduled
immediately.
Instead, the scheduler will ensure it gets its minimal resources before
actually starting the driver/executors.
+## Gang scheduling Styles
+
+Initially when the app encountered gang issues due to placeholder pod
allocation(failed due to various reasons), we marked the application failed
without retrying it. This wasn’t a really user friendly experience, so it led
to a demand of making the gangs scheduling style configurable and make it
possible to succeed to schedule the app through a fallback mechanism.
+
+To solve this issue we defined two Gang scheduling styles: Soft and Hard.
+
+- `Hard style`: when this style is used, we will have the initial behavior,
more precisely if the application cannot be scheduled according to gang
scheduling rules, and it times out, it will be marked as failed, without
retrying to schedule it.
Review comment:
when the app cannot be gang scheduled, it will be marked as failed
without retrying to schedule it.
##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -101,6 +101,14 @@ could not schedule all the placeholder pods, it will
eventually give up after a
freed up and used by other apps. If non of the placeholders can be allocated,
this timeout won't kick-in. To avoid the placeholder
pods stuck forever, please refer to
[troubleshooting](trouble_shooting.md#gang-scheduling) for solutions.
+` gangSchedulingStyle`
+
+Possible values: *Soft*, *Hard*
Review comment:
Possible values -> Valid values
##########
File path: docs/user_guide/gang_scheduling.md
##########
@@ -206,6 +214,64 @@ Annotations:
Once the job is submitted to the scheduler, the job won’t be scheduled
immediately.
Instead, the scheduler will ensure it gets its minimal resources before
actually starting the driver/executors.
+## Gang scheduling Styles
+
+Initially when the app encountered gang issues due to placeholder pod
allocation(failed due to various reasons), we marked the application failed
without retrying it. This wasn’t a really user friendly experience, so it led
to a demand of making the gangs scheduling style configurable and make it
possible to succeed to schedule the app through a fallback mechanism.
+
+To solve this issue we defined two Gang scheduling styles: Soft and Hard.
+
+- `Hard style`: when this style is used, we will have the initial behavior,
more precisely if the application cannot be scheduled according to gang
scheduling rules, and it times out, it will be marked as failed, without
retrying to schedule it.
+- `Soft style`: using this style will make it possible to schedule a gang
application as a normal, simple application if it cannot be scheduled and
started by following the gang scheduling rules. This means that in case of the
placeholder timeout the placeholders will be deleted and the application state
will transition to Resuming state. After all the placeholders are deleted, the
application will transition into Accepted state and the app’s pods will be
scheduled according to the non-gang application scheduling logic.
Review comment:
when the app cannot be gang scheduled, it will fall back to the normal
scheduling, and the non-gang scheduling strategy will be used to achieve the
best-effort scheduling. When this happens, the app transits to the Resuming
state and all the remaining placeholder pods will be cleaned up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]