Github user rezazadeh commented on the pull request:
https://github.com/apache/spark/pull/4934#issuecomment-77801646
Thank you for this PR @staple !
@mengxr I suggested to @staple to first implement without backtracking to
keep the PR as simple as possible. According to his plots (see JIRA), even
without backtracking, this PR achieves fewer iterations with the same cost per
iteration.
Note that backtracking requires several additional map-reduces per
iteration. This makes it unclear when backtracking is best used. So I suggested
to first merge the case that is a clear win (fewer iterations in the same cost
per iteration). I think we should merge this without backtracking, and then
have another PR to properly evaluate how backtracking affects total cost with
the goal of also merging backtracking.
It seems @staple has already implemented backtracking (because he has
results in the JIRA), but kept them out of this PR to keep it simple, so we can
tackle that afterwards.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]