2011/11/4 Andreas Müller <[email protected]>:
> Hi everybody.
> I was thinking about putting some work into making a multi layer
> perceptron implementation
> for sklearn. I think it would be a good addition to the other, mostly
> linear, classifiers
> in sklearn. Together with the decision trees / boosting that many people
> are working
> on at the moment, I think sklearn would cover most of the classifiers
> used today
> My question is: has anyone started with a mlp implementation yet? Or is
> there any
> code lying around that people think is already pretty good?
> I would try to keep it simple with support only for one hidden layer and do
> a pure python implementation to start with.

In the past (before getting involved in scikit-learn) I had started an
unfinished library in pure C + python ctypes bindings for MLP and
stacked autoencoders.  This is basically the same datastructure and
algorithms but one is supervised and the other is unsupervised.

https://bitbucket.org/ogrisel/libsgd/wiki/Home

I think it should be pretty straightforward to rewrite this in cython
directly. The important trick is to pre-allocate the memory buffer of
the minibatch size for both the hidden and output layers.

> I'm also open for any suggestions.
>
> My feature list would be:
> - online, minibatch and batch learning

I would start with minibatch (pure online with one sample at a time is
useless with python because of the interpreter overhead IMHO). Batch
learning seems less interesting than minibatch.

> - vanilla gradient descent and rprop
> - l2 weight decay optional

l2 weight decay is equivalent to l2 regularizer. I would add l1 and
elastic net too (or projection based regularization).

> - tanh nonlinearities

Also momentum seems important (and averaging might work too even
though the objective function is non convex in general).

> - a class for regression and one for classification
> - MSE and cross entropy (for classification only) loss functions

We need several loss functions and there gradient in cython (we cannot
reuse the loss function from the SGD module of since the output of a
MLP can be a multi-variate). For classification we will need hnigeloss
and squared hingeloss (and hubert for regression). See the source of
libsgd for a list of useful loss function.

> I think that would be a reasonable amount of features and should
> be pretty easy to maintain.

I think we are several developers with a good understanding of SGD so
I don't think it would be a big maintenance burden.

In any case, before embarking in this please read or re-read:

  http://yann.lecun.com/exdb/publis/#lecun-98b

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to