On Tue, Mar 20, 2012 at 09:05:45PM +0100, Andreas wrote: > > >> I recently posted a gist: https://gist.github.com/2061456 > >> And there is also a branch by me: > >> https://github.com/amueller/scikit-learn/tree/multilayer_perceptron
> If there are strong opinions (and good reasons) for supporting > multiple hidden layers, than we can also do that. One possible approach is to only implement an estimator for a single layer network, but expose primitives that would make easy to write more complicated things if so desired. Such an approach is taken in this code, which is slow as molasses but works pretty well, using Yann LeCun's old "fprop, bprop, grad" interface (see his 2007 NIPS workshop talk). https://gist.github.com/dwf/backproppy I didn't get all the software architecture right in hindsight, but one thing I think can be learned from this code that is a very powerful pattern: Make all of your parameters ONE array, and make individual parameters (weight matrices, biases, etc.) *non-overlapping views* on that array. This means that without much work, you can throw your network at a generic numerical optimizer that expects one big array of parameters: you just wrap it in a Python function that overwrites the parameter array in the object and computes/returns the cost. All of the architectural complexity goes away. And for small problem sizes/training set sizes, things like conjugate gradient can converge pretty fast. Also, with logistic sigmoid hidden units, keep this in mind: the gradient of a logistic sigmoid written in the "textbook" fashion is not so stable! https://gist.github.com/dwf/backproppy/blob/master/backproppy/layers.py#L94 David ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
