Hi David. I'll have a look at your code later today. Let me first answer your questions to my code
On 05/15/2012 12:12 AM, David Marek wrote: > Hi, > > 2) I used Andreas' implementation as an inspiration and I am not sure > I understand some parts of it: > * Shouldn't the bias vector be initialized with ones instead of > zeros? I guess there is no difference. I am always initializing it with zeros. If you initialize it with ones, you might get out of the linear part of the nonlinearity. At the beginning, you definitely want to stay close to the linear part to have meaningful derivatives. What would be the reason to initialize with ones? Btw, there is a Paper by Bengios group on how to initialize the weights in a "good" way. You should have a look at that, but I don't have the reference at the moment. > * I am not sure why is the bias updated with: > bias_output += lr * np.mean(delta_o, axis=0) > shouldn't it be: > bias_output += lr / batch_size * np.mean(delta_o, axis=0)? By doing the mean, the batch_size doesn't have an influence on the size of the gradient if I'm not mistaken. > * Shouldn't the backward step for computing delta_h be: > delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden) > where hidden.doutput is a derivation of the activation function for > hidden layer? Yes, it should be. For softmax and maximum entropy loss, loads of stuff gets canceled and the derivative wrt the output is linear. Try wolfram alpha if you don't believe me ;) I haven't really found a place with a good derivation for this. It is not very obvious to me. > > I hope my questions are not too stupid. Thank you. > Not at all. Cheers, Andy ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
