2011/11/4 Andreas Müller <[email protected]>: > Hi everybody. > I was thinking about putting some work into making a multi layer > perceptron implementation > for sklearn. I think it would be a good addition to the other, mostly > linear, classifiers > in sklearn. Together with the decision trees / boosting that many people > are working > on at the moment, I think sklearn would cover most of the classifiers > used today > My question is: has anyone started with a mlp implementation yet? Or is > there any > code lying around that people think is already pretty good? > I would try to keep it simple with support only for one hidden layer and do > a pure python implementation to start with.
In the past (before getting involved in scikit-learn) I had started an unfinished library in pure C + python ctypes bindings for MLP and stacked autoencoders. This is basically the same datastructure and algorithms but one is supervised and the other is unsupervised. https://bitbucket.org/ogrisel/libsgd/wiki/Home I think it should be pretty straightforward to rewrite this in cython directly. The important trick is to pre-allocate the memory buffer of the minibatch size for both the hidden and output layers. > I'm also open for any suggestions. > > My feature list would be: > - online, minibatch and batch learning I would start with minibatch (pure online with one sample at a time is useless with python because of the interpreter overhead IMHO). Batch learning seems less interesting than minibatch. > - vanilla gradient descent and rprop > - l2 weight decay optional l2 weight decay is equivalent to l2 regularizer. I would add l1 and elastic net too (or projection based regularization). > - tanh nonlinearities Also momentum seems important (and averaging might work too even though the objective function is non convex in general). > - a class for regression and one for classification > - MSE and cross entropy (for classification only) loss functions We need several loss functions and there gradient in cython (we cannot reuse the loss function from the SGD module of since the output of a MLP can be a multi-variate). For classification we will need hnigeloss and squared hingeloss (and hubert for regression). See the source of libsgd for a list of useful loss function. > I think that would be a reasonable amount of features and should > be pretty easy to maintain. I think we are several developers with a good understanding of SGD so I don't think it would be a big maintenance burden. In any case, before embarking in this please read or re-read: http://yann.lecun.com/exdb/publis/#lecun-98b -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
