PhillHenry opened a new pull request #31534:
URL: https://github.com/apache/spark/pull/31534


   ### What changes were proposed in this pull request?
   
   Code in the PR generates random parameters for hyperparameter tuning. A 
discussion with Sean Owen can be found on the dev mailing list here:
   
   
http://apache-spark-developers-list.1001551.n3.nabble.com/Hyperparameter-Optimization-via-Randomization-td30629.html
   
   ### Why are the changes needed?
   
   Randomization can be a more effective techinique than a grid search since 
min/max points can fall between the grid and never be found. Randomisation is 
not so restricted although the probability of finding minima/maxima is 
dependent on the number of attempts. 
   
   Alice Zheng has an accessible description on how this technique works at 
https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html
   
   Although there are Python libraries with more sophisticated techniques, not 
every Spark developer is using Python. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   A new class (`ParamRandomBuilder.scala`) and its tests have been created but 
there is no change to existing code. This class offers an alternative to 
`ParamGridBuilder` and can be dropped into the code wherever `ParamGridBuilder` 
appears. Indeed, it extends `ParamGridBuilder` and is completely compatible 
with  its interface. It merely adds one method that provides a range over which 
a hyperparameter will be randomly defined.
   
   ### How was this patch tested?
   
   Tests `ParamRandomBuilderSuite.scala` and `RandomRangesSuite.scala` were 
added.
   
   `ParamRandomBuilderSuite` is the analogue of the already existing 
`ParamGridBuilderSuite` which tests the user-facing interface.
   
   `RandomRangesSuite` uses ScalaCheck to test the random ranges over which 
hyperparameters are distributed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to