Github user etrain commented on the pull request:
https://github.com/apache/spark/pull/886#issuecomment-46465872
I've taken a first pass at this and at a high level it looks good.
The main two things I'd say are
1) I think an implicit that converts LabeledPoint to WeightedLabeledPoint
could go a long way at removing some of the boilerplate introduced by this PR.
2) I'm getting a little concerned that we could modularize a little better
- for example, every time we do a "strategy.algo match" - it feels like we
could just as easily have a separate class for Regression algo, Decision algo,
etc. For example, each separate algo could implement its own "binSeqOp" and a
few other methods and the base class could tie these all together. This would
be a fairly major refactoring and is maybe better suited for a later PR.
I still need to look closely at the principal logical changes in
DecisionTree.scala - and will try to get to this before the end of the week.
Thanks for your patience!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---