Hi David.
I'll have a look at your code later today.
Let me first answer your questions to my code

On 05/15/2012 12:12 AM, David Marek wrote:
> Hi,
>
> 2) I used Andreas' implementation as an inspiration and I am not sure
> I understand some parts of it:
>   * Shouldn't the bias vector be initialized with ones instead of
> zeros? I guess there is no difference.
I am always initializing it with zeros. If you initialize it
with ones, you might get out of the linear part of the
nonlinearity. At the beginning, you definitely want to stay
close to the linear part to have meaningful derivatives.
What would be the reason to initialize with ones?
Btw, there is a Paper by Bengios group on how to initialize
the weights in a "good" way. You should have a look at that,
but I don't have the reference at the moment.
>   * I am not sure why is the bias updated with:
>     bias_output += lr * np.mean(delta_o, axis=0)
>     shouldn't it be:
>     bias_output += lr / batch_size * np.mean(delta_o, axis=0)?
By doing the mean, the batch_size doesn't have an influence on the size
of the gradient if I'm not mistaken.
>   * Shouldn't the backward step for computing delta_h be:
>     delta_h[:] = np.dot(delta_o, weights_output.T) * hidden.doutput(x_hidden)
>     where hidden.doutput is a derivation of the activation function for
> hidden layer?
Yes, it should be. For softmax and maximum entropy loss, loads of stuff
gets canceled and the derivative wrt the output is linear.
Try wolfram alpha if you don't believe me ;) I haven't really found a place
with a good derivation for this. It is not very obvious to me.
>
> I hope my questions are not too stupid. Thank you.
>
Not at all.

Cheers,
Andy

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to