mengxr edited a comment on pull request #33800: URL: https://github.com/apache/spark/pull/33800#issuecomment-904140869
@srowen @PhillHenry My comment in the JIRA is mostly on the correctness. I think there are couple issues with the current implementation: * `Generator`, `Limits`, and `RangeRanges` are public. Is it necessary? * MLlib algorithm's behavior should be fully determined by its params. Allowing implicit random generator breaks this rule. Instead, it should take a `seed` from users. * The `ParamRandomBuilder` doesn't have a good ScalaDoc. It starts with a quote. And without context, it is misleading because `within 5%` is measured by volume, instead of accuracy. I think we should provide a clean implementation without mixing in `ParamGridBuilder`. I think `RandomParamMapsBuilder` may be a better name. It should take a seed and `numParamsMap` as well as methods to take params and random ranges (without specifying `n`). Besides numerical values, we might want to add a random choice method for discrete string params. I don't think we need `addGrid` because it is confusing. The result is no longer a grid and it is hard to understand it along with `numParamsMaps`. I don't think we should release it in the current form. We can try to fix it or revert it in branch 3.2 if running late. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
