On Tue, May 15, 2012 at 12:12:34AM +0200, David Marek wrote: > Hi, > > I have worked on multilayer perceptron and I've got a basic > implementation working. You can see it at > https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp The most > important part is the sgd implementation, which can be found here > https://github.com/davidmarek/scikit-learn/blob/gsoc_mlp/sklearn/mlp/mlp_fast.pyx > > I have encountered a few problems and I would like to know your opinion. > > 1) There are classes like SequentialDataset and WeightVector which are > used in sgd for linear_model, but I am not sure if I should use them > here as well. I have to do more with samples and weights than just > multiply and add them together. I wouldn't be able to use numpy > functions like tanh and do batch updates, would I? What do you think?
I haven't had a look at these classes myself but I think working with raw NumPy arrays is a better idea in terms of efficiency. > Am I missing something that would help me do everything I need with > SequentialDataset? I implemented my own LossFunction because I need a > vectorized version, I think that is the same problem. > > 2) I used Andreas' implementation as an inspiration and I am not sure > I understand some parts of it: > * Shouldn't the bias vector be initialized with ones instead of > zeros? I guess there is no difference. If the training set is mean-centered, then absolutely, yes. Otherwise the biases should in the hidden layer should be initialized to the mean over the training set of -Wx, where W are the initial weights. This ensures that the activation function is near its linear regime. > * I am not sure why is the bias updated with: > bias_output += lr * np.mean(delta_o, axis=0) > shouldn't it be: > bias_output += lr / batch_size * np.mean(delta_o, axis=0)? As Andy said, the former allows you to set the learning rate without taking into account the batch size, which makes things a little simpler. > * Shouldn't the backward step for computing delta_h be: > delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden) > where hidden.doutput is a derivation of the activation function for > hidden layer? Offhand that sounds right. You can use Theano as a sanity check for your implementation. David ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
