On Wed, Jun 6, 2012 at 1:50 PM, xinfan meng <mxf3...@gmail.com> wrote:
>
> I think these two delta_o have the same meaning. If you have "Pattern
> Recognition and Machine Learning" by Bishop, you can find that Bishop use
> exactly the second formula in the back propagation algorithm. I suspect
> these two formulae lead to the same update iterations, but I can't see why
> now.
>
Thanks for the idea, I read about NN there and here is my explanation,
Bishop uses this forward step (page 245):
a_j = \sum_{i=0}^D w_{ji} x_i
z_j = tanh(a_j)
y_k = \sum_{j=0}^M w_{kj} z_j
He is using a linear activation function, because it's easy to compute it's
derivation. ;-) So in this case
∇w_{ij} = delta_j * x_i
∇w_{ij} = (y_j - t_j) * x_i
If you'd use another activation function, for example tanh, which can be
used for classification in current implementation, you'd have
∇w = (t-y) * dtanh(y) * x
So both pages are correct because each uses different activation function.
The difference is when you have f(x) = w*x then
f'(x) = x
while for f(x) = tanh(w*x)
f'(x) = dtanh(w*x) * x
David
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general