Re: Hyperparameter Optimization via Randomization

2021-02-09 Thread Phillip Henry
Hi, Sean. I've added a comment in the new class to suggest a look at Hyperopt etc if the user is using Python. Anyway I've created a pull request: https://github.com/apache/spark/pull/31535 and all tests, style checks etc pass. Wish me luck :) And thanks for the support :) Phillip On Mon,

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Sean Owen
It seems pretty reasonable to me. If it's a pull request we can code review it. My only question is just, would it be better to tell people to use hyperopt, and how much better is this than implementing randomization on the grid. But the API change isn't significant so maybe just fine. On Mon,

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Phillip Henry
Hi, Sean. I don't think sampling from a grid is a good idea as the min/max may lie between grid points. Unconstrained random sampling avoids this problem. To this end, I have an implementation at: https://github.com/apache/spark/compare/master...PhillHenry:master It is unit tested and does not

Re: Hyperparameter Optimization via Randomization

2021-01-30 Thread Sean Owen
I was thinking ParamGridBuilder would have to change to accommodate a continuous range of values, and that's not hard, though other code wouldn't understand that type of value, like the existing simple grid builder. It's all possible just wondering if simply randomly sampling the grid is enough.

Re: Hyperparameter Optimization via Randomization

2021-01-30 Thread Phillip Henry
Hi, Sean. Perhaps I don't understand. As I see it, ParamGridBuilder builds an Array[ParamMap]. What I am proposing is a new class that also builds an Array[ParamMap] via its build() method, so there would be no "change in the APIs". This new class would, of course, have methods that defined the

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I think that's a bit orthogonal - right now you can't specify continuous spaces. The straightforward thing is to allow random sampling from a big grid. You can create a geometric series of values to try, of course - 0.001, 0.01, 0.1, etc. Yes I get that if you're randomly choosing, you can

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Thanks, Sean! I hope to offer a PR next week. Not sure about a dependency on the grid search, though - but happy to hear your thoughts. I mean, you might want to explore logarithmic space evenly. For example, something like "please search 1e-7 to 1e-4" leads to a reasonably random sample being

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I don't know of anyone working on that. Yes I think it could be useful. I think it might be easiest to implement by simply having some parameter to the grid search process that says what fraction of all possible combinations you want to randomly test. On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry

Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Hi, I have no work at the moment so I was wondering if anybody would be interested in me contributing code that generates an Array[ParamMap] for random hyperparameters? Apparently, this technique can find a hyperparameter in the top 5% of parameter space in fewer than 60 iterations with 95%