> In my case I don't use RPROP (I don't know what it is and I just use a > simple backprop) and I use Leon Bottou's trick to perform a burn-in on > the first 10k samples with a grid search of learning rate parameters > and then select the most effective learning rate and multiply it by 2 > (it brings robustness). In my experiment it did work pretty well. > I only learned about this trick recently and haven't really used it yet. We tried it on convolutional nets and it didn't work well :-/ Maybe I'll give it another shot.
RPROP maintains a dynamic learn rate for each parameter. It only looks at the sign of the gradient. There are two parameters but these are always set to the values described in the paper A direct adaptive method for faster backpropagation learning: The RPROP algorithm. So actually there are no parameters at all, which is pretty convenient. > I used to use a 1/t style learning rate schedule but yesterday Francis > Bach convinced me to use 1/sqrt(t) and use averaging instead. > I think Leon Bottou also uses something different for averaging, but I thought it was t^0.75 or something. Maybe I'll do it without averaging first. > Here is the calibration stuff: > > https://bitbucket.org/ogrisel/libsgd/src/0a232b053b5b/lib/architecture.c#cl-360 > ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
