On 11/04/2011 06:33 PM, David Warde-Farley wrote:
> On Fri, Nov 04, 2011 at 06:12:48PM +0100, Andreas Müller wrote:
>>> In my case I don't use RPROP (I don't know what it is and I just use a
>>> simple backprop) and I use Leon Bottou's trick to perform a burn-in on
>>> the first 10k samples with a grid search of learning rate parameters
>>> and then select the most effective learning rate and multiply it by 2
>>> (it brings robustness). In my experiment it did work pretty well.
>>>
>> I only learned about this trick recently and haven't really used it yet.
>> We tried it on convolutional nets and it didn't work well :-/
>> Maybe I'll give it another shot.
>>
>> RPROP maintains a dynamic learn rate for each parameter.
> Sounds a bit like "delta-bar-delta".
>
Don't know about that. RPROP is pretty old, 1993 I think:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=298623

>> It only looks at the sign of the gradient. 
> Surely it can't work online then, can it?
It can work with mini-batches that are "large enough",
I think. But not really online, no.
Also you need twice as much memory as you have
to keep gradient information and current learning rates
in memory.
These are definitely two downsides.
We still used it successfully on "big" datasets like NORB
jittered-cluttered and cifar.

If you can afford batch learning, I think it is worth a try
since there are no parameters to tune and it
often works well.


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to