Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-50288916
I see, my point was that we shouldn't try to make optimizations or major
refactorings here. It makes it much harder to tell whether the code is correct,
and this is code that's been tricky to debug, because it's split across two
methods.
I think you could make it much simpler as follows:
- Don't add any extra parameters to resourceOffer
- Call it with NO_PREFS after NODE_LOCAL
- In resourceOffer, if the locality level passed in is NO_PREFS, don't
adjust the locality (basically what your Boolean is doing, except that instead
of being a Boolean, you just check whether the target level is NO_PREFS). (In
fact if you'd like you can even have a separate top-level path in resourceOffer
for NO_PREFS so that we don't have to have code dealing with it scattered
throughout. That is, just make the call to NO_PREFS return *only* no-prefs
tasks, not process-local or node-local.)
The reason this is simpler is that every time you introduce parameters,
people can pass in any combination of those parameters, and in fact in this
code you only expect one particular combination. It's very hard for any new
reader of the code to figure that out. The optimization with bottomLocality
might be useful but it shouldn't be part of this bug fix PR -- this PR should
just fix the bug. As I said before, I have my doubts that it will make a huge
impact because I haven't seen this code be a bottleneck in profiles.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---