2011/11/4 Andreas Müller <[email protected]>: > >>> For a simple mlp, I think theano will not beat a hand implemented version. >> I think you'd be in for a rather rude surprise, at least on your first >> attempt. :) >> > It'll not be my first attempt but I must confess, I never benchmarked > my labs GPU mlp against yours ;) >>> Afaik, torch7 is faster than theano for cnns and mlps and there >>> is no compilation of algorithms there. >> Haven't looked at Torch7, though I know we beat Torch5 pretty painfully. >> >>> But I thought more about an easy to use classifier. >> That, I think, is the fundamental flaw in the plan. Neural networks are >> anything but "easy to use", and getting good results out of them takes quite >> a bit of work. >> >> I say this (perhaps at my own peril) as a student in one of the larger labs >> that still study this stuff, but there are really three regimes where neural >> networks make sense over the stuff already in scikit-learn: >> >> - The dataset is *gigantic*, online learning is essential, and simpler >> algorithms don't cut it. >> >> - The dataset is huge and the task complex enough that it requires multiple >> layers of representation and/or sophisticated pre-training algorithms >> (unsupervised feature learning). >> >> - The dataset is slightly smaller, linear learning doesn't suffice, >> but model compactness and speed/efficiency of evaluation is of great >> importance, so kernel methods won't work. >> >> In my experience, about 95% of the time, people trying to apply MLPs and >> failing are not in any of these situations and would be better served with >> methods that are easily "canned" for non-expert use. >> > I am only part of a very small lab that still study this stuff, so I guess > you have more experience in these things. > I was mainly thinking about the first use case. > For example, in this paper: > http://www.cs.cornell.edu/~ainur/pubs/empirical.pdf > neural networks fare pretty well, it seems without to much tuning. > > In my experience, the hardest thing to find is a good learning rate. > Using RPROP, I always got pretty decent results on the first try. > > What kind of datasets have you used? And what kind of tuning > did you have to do?
In my case I don't use RPROP (I don't know what it is and I just use a simple backprop) and I use Leon Bottou's trick to perform a burn-in on the first 10k samples with a grid search of learning rate parameters and then select the most effective learning rate and multiply it by 2 (it brings robustness). In my experiment it did work pretty well. I used to use a 1/t style learning rate schedule but yesterday Francis Bach convinced me to use 1/sqrt(t) and use averaging instead. Here is the calibration stuff: https://bitbucket.org/ogrisel/libsgd/src/0a232b053b5b/lib/architecture.c#cl-360 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
