subject:"\[GitHub\] \[spark\] bmarcott edited a comment on issue #26696\: \[WIP\]\[SPARK\-18886\]\[CORE\] Make locality wait time be the time since a TSM's available slots were fully utilized"

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2020-01-06 Thread GitBox

bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-571476226
 
 
   @tgravescs @cloud-fan 
   Thanks for taking a look!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2020-01-06 Thread GitBox

bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-568617329
 
 
   I think I came up with a much better approach 
[here](https://github.com/apache/spark/compare/master...bmarcott:nmarcott-fulfill-slots-2?expand=1).
It avoids trying to simulate scheduling logic like the previous approach, 
which had a lot of discrepancies as well as high time complexity.
   
   This change makes the `TaskSetManager.resourceOffer` return an explicit 
boolean saying whether it rejected the resource due to delay scheduling or not
   
   An `isAllFreeResources` boolean parameter was also added to 
`TaskSchedulerImpl.resourceOffers` which tells the scheduler the offers 
represent all free resources as opposed to a single resource. 
   
   Then, timers will be reset only if there were no resources rejected due to 
scheduling delay since the last offer which included all free resources. 
   
   example event sequence:
   offer 1 resource that was rejected - no timer reset
   offer all resources with no rejects - timer is reset
   offer 1 resource, no reject - timer is reset
   offer 1 resource that was rejected - no timer reset
   offer 1 resource, no reject - no timer reset because previous offer was 
rejected
   
   Here is a breakdown of when resources are offered (not changed):
   Single executors are offered when:
   - a task finishes
   - new executor launched
   
   All free resources are offered when:
   - continually every spark.scheduler.revive.interval seconds (default 1 
second)
   - on taskset submit
   - when a task fails
   - speculationScheduler on fixed delay revives if there are any speculative 
tasks
   - executor lost

   One remaining case that isn't handled:
   Before any "all free resource" offer, all free resources are offered one by 
one and all not rejected.
   This case should reset the timer, but won't with current impl.
   
   Thoughts or know of any other issues with this approach?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-25 Thread GitBox

bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-568617329
 
 
   I think I came up with a much better approach 
[here](https://github.com/apache/spark/compare/master...bmarcott:nmarcott-fulfill-slots-2?expand=1).
It avoids trying to simulate scheduling logic like the previous approach, 
which had a lot of discrepancies as well as high time complexity.
   
   This change makes the `TaskSetManager.resourceOffer` return an explicit 
boolean saying whether it rejected the resource due to delay scheduling or not
   
   An `isAllFreeResources` boolean parameter was also added to 
`TaskSchedulerImpl.resourceOffers` which tells the scheduler the offers 
represent all free resources as opposed to a single resource. 
   
   Then, timers will be reset only if there were no resources rejected due to 
scheduling delay since the last offer which included all free resources. 
   
   example case:
   offer 1 resource that was rejected - no timer reset
   offer all resources with no rejects - timer is reset
   offer 1 resource, no reject - timer is reset
   offer 1 resource that was rejected - no timer reset
   offer 1 resource, no reject - no timer reset because previous offer was 
rejected
   
   Here is a breakdown of when resources are offered (not changed):
   Single executors are offered when:
   - a task finishes
   - new executor launched
   
   All free resources are offered when:
   - continually every spark.scheduler.revive.interval seconds (default 1 
second)
   - on taskset submit
   - when a task fails
   - speculationScheduler on fixed delay revives if there are any speculative 
tasks
   - executor lost

   One remaining case that isn't handled:
   Before any "all free resource" offer, all free resources are offered one by 
one and all not rejected.
   This case should reset the timer, but won't with current impl.
   
   Thoughts or know of any other issues with this approach?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-25 Thread GitBox

bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-568617329
 
 
   [Here is an 
approach](https://github.com/apache/spark/compare/master...bmarcott:nmarcott-fulfill-slots-2?expand=1)
 which avoids trying to simulate any type of scheduling logic.
   
   This change introduces a `isAllFreeResources` boolean parameter for 
`TaskSchedulerImpl.resourceOffers` which tells the scheduler the offers 
represent all free resources as opposed to a single resource. 
   
   Timers will be reset only if there were no resources rejected due to 
scheduling delay since the last offer which included all free resources. 
   
   example case:
   offer 1 resource that was rejected - no timer reset
   offer all resources with no rejects - timer is reset
   offer 1 resource, no reject - timer is reset
   offer 1 resource that was rejected - no timer reset
   offer 1 resource, no reject - no timer reset because previous offer was 
rejected
   
   Here is a breakdown of when resources are offered (not changed):
   Single executors are offered when:
   - a task finishes
   - new executor launched
   
   All free resources are offered when:
   - continually every spark.scheduler.revive.interval seconds (default 1 
second)
   - on taskset submit
   - when a task fails
   - speculationScheduler on fixed delay revives if there are any speculative 
tasks
   - executor lost

   One remaining case that isn't handled:
   Before any "all free resource" offer, all free resources are offered one by 
one and all not rejected.
   This case should reset the timer, but won't with current impl.
   
   Thoughts or know of any other issues with this approach?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-24 Thread GitBox

bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make 
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-568617329
 
 
   [Here is an 
approach](https://github.com/apache/spark/compare/master...bmarcott:nmarcott-fulfill-slots-2?expand=1)
 which avoids trying to simulate any type of scheduling logic.
   This change resets locality timers only when no resources have been rejected 
due to delay scheduling.
   
   The complication in this change is that resetting the locality timer is 
dependent on when `TaskSchedulerImpl.resourceOffers` is called with all 
resources, rather than a single resource.
   That is guaranteed to happen only as frequent as 
`spark.scheduler.revive.interval` which defaults to 1s. This means the 
`locality.wait.time` wouldn't we completely respected.
   Ways around this would be to always offer all free resources on calls to 
`TaskSchedulerImpl.resourceOffers`. 
   Thoughts or know of any other issues with this approach?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-07 Thread GitBox

bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make
locality wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-562798874

inline responses for @tgravescs
> arguably you should never reset back to a lower level.

Yea I was thinking the same thing, but I decided not to touch that part yet,
since I have mostly thought about **when** to reset.

> Note on the fairscheduler side - I think you do have the issue Kay brought
up in v2 of comment:
https://issues.apache.org/jira/browse/SPARK-18886?focusedCommentId=15931009&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15931009

I do not believe this PR has that issue, as once the non-greedy taskset has
waited its locality wait time it will increase locality level since it wasn't
able to take advantage of its slots. Her proposed v1 solution had that problem
since it only increased locality level if there were slots not being used by
**any taskset**.

> I assume it fixes the performance on #26633? thanks for adding more unit
tests, did run any manual tests as well?

I haven't looked in depth at their problem, but it does seem it would solve
most of the problem.
I have not yet had the chance to run manual tests.

> so looking at the fairscheduler side of this it brings me back to the real
question of what is the definition of a task wait. If you look at Kay's example:

I did not catch your point from the comment that started with above quote.
This PR handles the case she mentioned since locality level would not increase
because it is utilizing all its available slots (timers will keep getting
reset). That means when the taskset is offered the new non-local resource, it
wouldn't immediately take advantage of it

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-07 Thread GitBox

inline responses for @tgravescs
> arguably you should never reset back to a lower level.

Yea I was thinking the same thing, but I decided not to touch that part yet,
since I have mostly thought about **when** to reset.

> I assume it fixes the performance on #26633? thanks for adding more unit
tests, did run any manual tests as well?

I haven't looked in depth at their problem, but it does seem it would solve
most of the problem.
I have not yet had the chance to run manual tests.

> so looking at the fairscheduler side of this it brings me back to the real
question of what is the definition of a task wait. If you look at Kay's example:

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2019-12-06 Thread GitBox

inline responses for @tgravescs
> arguably you should never reset back to a lower level.

Yea I was thinking the same thing, but I decided not to touch that part yet,
since I have mostly thought about **when** to reset.

> I assume it fixes the performance on #26633? thanks for adding more unit
tests, did run any manual tests as well?

I haven't looked in depth at their problem, but it does seem it would solve
most of the problem.
I have not yet had the chance to run manual tests.

> so looking at the fairscheduler side of this it brings me back to the real
question of what is the definition of a task wait. If you look at Kay's example:

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

[GitHub] [spark] bmarcott edited a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

8 matches

Site Navigation

Mail list logo

Footer information