Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/2746#issuecomment-59280725
@andrewor14 this looks cool! Two high level comments:
(1) Right now, there are a bunch of different kinds of timers that can be
set and then triggered later. Is it possible to do something much simpler,
where you just have a single timer, and when the timer goes off, you check all
of the state to see what should happen next? For example, for executor removal,
you could have a map of executors to when they were last used (or sorted list
based on when they were last used?) and remove any that haven't been used for a
threshold amount of time. As I read the code, I was wondering if this would
significantly simplify a bunch of the code, with negligible performance impact
(?).
(2) I'm a little concerned about the changes to the scheduler code. In
general, the scheduler has evolved to this point where the interface between
the various scheduler components is very messy, and it's really hard to
describe (does anyone even know?!) what happens in which part of the scheduler.
The way you've done this right now adds some logic in the task set manager (to
determine when something has changed on the level of the task set, in which
case the task scheduler needs to be notified) and to the task scheduler
implementation, to determine when something has changed across all task sets,
in which case it needs to notify the Allocation Manager.
Based on a cursory look, it looks like potentially *all* of the logic added
to the task scheduler impl could instead be done in the Allocation Manager (by
moving all of the data structures you added in lines 118 - 132 to the
Allocation Manager). I'd prefer this because then the Allocation Manager can
have a very narrow interface that shields the scheduler from a lot of
complexity.
In this case, you could pass a reference to Allocation Manager to each TSM,
which would call a "newPendingTask" method on the Allocation Manager (for
example). Similar to with the TaskSchedulerImpl, I think most of the
functionality in the TSM related to the allocation manager code could be pushed
into the allocation manager, so that the TSM doesn't have to handle any of the
complexity of tracking new information itself -- instead, it would just call a
method on the Allocation Manager when tasks get started / finish.
One potential issue I see with pushing all of the functionality from the
TSM into the allocation manager is that right now, the TSM can drop some
messages from getting sent back through the scheduler / allocation manager --
like when a task finishes on an executor, but there are other tasks for the
task set still running there, so the allocation manager doesn't need to know
about it. I'd guess this isn't a huge performance hit -- the complexity of all
of this logic is still O(number of tasks being scheduled) and is merited by the
simplicity benefit of having the AllocationManager do everything.
At the end of the day, it seems like the AllocationManager needs to know
when a task starts and when one finishes. So, can it just implement the
listener interface, and the scheduler doesn't directly interact with it at all?
It looks like you mentioned this idea in your design doc, so I'm wondering if
you stumbled upon some problem when you tried doing this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]