2011/11/4 Andreas Müller <[email protected]>: > On 11/04/2011 06:33 PM, David Warde-Farley wrote: >> On Fri, Nov 04, 2011 at 06:12:48PM +0100, Andreas Müller wrote: >>>> In my case I don't use RPROP (I don't know what it is and I just use a >>>> simple backprop) and I use Leon Bottou's trick to perform a burn-in on >>>> the first 10k samples with a grid search of learning rate parameters >>>> and then select the most effective learning rate and multiply it by 2 >>>> (it brings robustness). In my experiment it did work pretty well. >>>> >>> I only learned about this trick recently and haven't really used it yet. >>> We tried it on convolutional nets and it didn't work well :-/ >>> Maybe I'll give it another shot. >>> >>> RPROP maintains a dynamic learn rate for each parameter. >> Sounds a bit like "delta-bar-delta". >> > Don't know about that. RPROP is pretty old, 1993 I think: > http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=298623 > >>> It only looks at the sign of the gradient. >> Surely it can't work online then, can it? > It can work with mini-batches that are "large enough", > I think. But not really online, no.
Even if the mini-batch is not large enough, you can remember past information from a large window in constant memory size using exponentially weighted moving averages: https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average > Also you need twice as much memory as you have > to keep gradient information and current learning rates > in memory. > These are definitely two downsides. As long as it's pre-allocated before the main loop that should not be a problem. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
