Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/7294#issuecomment-122143399
@manishamde Thanks for taking a look! I do want to run some performance
tests to make sure they are similar. For accuracy, current tests run all unit
tests and make sure ml and mllib output exactly the same models. I hope that's
sufficient (but I do think we should continue to improve unit tests).
@mengxr Thanks for reviewing! If I had felt at liberty to do more of these
cleanups, I would have. : ) I'm going ahead and fixing the cleanups, save for
unpersisting the last prevNodeIdsForInstances.
W.r.t making the changes discoverable, I agree it's a hard problem. I
think removing bins was worth the trouble to avoid creating even more work for
ourselves (new tree representation with bins, using bins internally, removing
it eventually). I was relying on unit tests to give confidence. It'd be good
to come up with ideas for the next time this happens.
Let me know if there is a problem with storing the FileSystem instance. (I
assume it's fine but don't know much about HDFS...)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]