On Wed, Jun 6, 2012 at 1:50 PM, xinfan meng <mxf3...@gmail.com> wrote:
>
> I think these two delta_o have the same meaning. If you have "Pattern
> Recognition and Machine Learning" by Bishop, you can find that Bishop use
> exactly the second formula in the back propagation algorithm. I suspect
> these two formulae lead to the same update iterations, but I can't see why
> now.
>

Thanks for the idea, I read about NN there and here is my explanation,
Bishop uses this forward step (page 245):

a_j = \sum_{i=0}^D w_{ji} x_i
z_j = tanh(a_j)
y_k = \sum_{j=0}^M w_{kj} z_j

He is using a linear activation function, because it's easy to compute it's
derivation. ;-) So in this case
∇w_{ij} = delta_j * x_i
∇w_{ij} = (y_j - t_j) * x_i

If you'd use another activation function, for example tanh, which can be
used for classification in current implementation, you'd have
∇w = (t-y) * dtanh(y) * x

So both pages are correct because each uses different activation function.

The difference is when you have f(x) = w*x then
f'(x) = x
while for f(x) = tanh(w*x)
f'(x) = dtanh(w*x) * x

David
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to