Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/7276#issuecomment-123487912
i took a medium-level pass, not too detailed and overall approach into the
event loop looks. However I have some high level comments.
1. In the code refactoring, why did you move all the `registerReceiver`,
`deregisterReceiver`, etc. into the Endpoint class? That moves around a lot of
code, making it hard to review. Also it makes the overall structure more
complex - some functionality is in the parent ReceiverTracker class and some
are in the EndPoint class, hard to reason about what goes where in the future.
Instead the earlier structure had the goal of all the functionality is in the
ReceiverTracker, and the Endpoint class was just to going to be the event loop
that accesses that functionality. I think its more structured if that is
maintained.
2. I think the scheduling policy needs to be thought through. It may be in
the case we need two steps, one of bulk scheduling in the beginning, which
tries to distribute it properly (n recvrs in n execs) by setting preferred
locations on all of them. Then for every receiver that needs to be restarted,
they are scheduled one at a time. The schedulingPolicy can be changed to take
an array of receiverIds to accommodate this, initially all the receiverIds are
sent, and the locations of all of them returned; then one at a time. How does
that sound?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]