2011/11/4 Andreas Müller <[email protected]>:
>
>>> For a simple mlp, I think theano will not beat a hand implemented version.
>> I think you'd be in for a rather rude surprise, at least on your first
>> attempt. :)
>>
> It'll not be my first attempt but I must confess, I never benchmarked
> my labs GPU mlp against yours ;)
>>> Afaik, torch7 is faster than theano for cnns and mlps and there
>>> is no compilation of algorithms there.
>> Haven't looked at Torch7, though I know we beat Torch5 pretty painfully.
>>
>>> But I thought more about an easy to use classifier.
>> That, I think, is the fundamental flaw in the plan. Neural networks are
>> anything but "easy to use", and getting good results out of them takes quite
>> a bit of work.
>>
>> I say this (perhaps at my own peril) as a student in one of the larger labs
>> that still study this stuff, but there are really three regimes where neural
>> networks make sense over the stuff already in scikit-learn:
>>
>> - The dataset is *gigantic*, online learning is essential, and simpler
>>   algorithms don't cut it.
>>
>> - The dataset is huge and the task complex enough that it requires multiple
>>   layers of representation and/or sophisticated pre-training algorithms
>>   (unsupervised feature learning).
>>
>> - The dataset is slightly smaller, linear learning doesn't suffice,
>>   but model compactness and speed/efficiency of evaluation is of great
>>   importance, so kernel methods won't work.
>>
>> In my experience, about 95% of the time, people trying to apply MLPs and
>> failing are not in any of these situations and would be better served with
>> methods that are easily "canned" for non-expert use.
>>
> I am only part of a very small lab that still study this stuff, so I guess
> you have more experience in these things.
> I was mainly thinking about the first use case.
> For example, in this paper:
> http://www.cs.cornell.edu/~ainur/pubs/empirical.pdf
> neural networks fare pretty well, it seems without to much tuning.
>
> In my experience, the hardest thing to find is a good learning rate.
> Using RPROP, I always got pretty decent results on the first try.
>
> What kind of datasets have you used? And what kind of tuning
> did you have to do?

In my case I don't use RPROP (I don't know what it is and I just use a
simple backprop) and I use Leon Bottou's trick to perform a burn-in on
the first 10k samples with a grid search of learning rate parameters
and then select the most effective learning rate and multiply it by 2
(it brings robustness). In my experiment it did work pretty well.

I used to use a 1/t style learning rate schedule but yesterday Francis
Bach convinced me to use 1/sqrt(t) and use averaging instead.

Here is the calibration stuff:
  
https://bitbucket.org/ogrisel/libsgd/src/0a232b053b5b/lib/architecture.c#cl-360

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to