Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3022#issuecomment-66563244
@tgaloppo Thanks very much for the PR, and sincere apologies for the slow
response about it! @manishamde was right about people being too preoccupied
with the 1.2 release. It will be great to get GMMs into MLlib though!
Iâve added some inline comments, and have put a few general comments
below.
Weâre moving away from some of the old API conventions, and it would be
nice to try to fit this to match newer APIs (especially the experimental
spark.ml branch). In particular, the static train() methods are part of the
old API: Itâs becoming hard to maintain train() methods with explicit lists
of parameters (as we add more parameters to existing algorithms). Weâd
prefer to stick with the builder pattern you have implemented in
GMMExpectationMaximization, where you can call setter methods (setK, etc.) to
set parameters before calling run().
* I would recommend eliminating object GMMExpectationMaximization and
keeping the class API basically as is.
* Could you please add getter methods (getK, etc.) to the class
GMMExpectationMaximization?
Tests: Iâd recommend adding more tests. Itâs good to test a single
cluster, as you did. It might also be good to test multiple clusters, where
you take precautions to make sure the test will almost certainly succeed (e.g.,
use pre-selected random seeds or enough trials). Iâll think more about
possible tests.
Scaling: It will be important to get a sense of how efficient/scalable the
implementation is. Would you be able to run tests on a small cluster? If not,
the community might be able to help.
Side note: I noticed this is a PR from your master branch. Itâs
generally easier to create a separate branch for each PR you plan to contribute.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]