The Python authors' comments here explain (well, they assert) why we're
doing that filtering for hidden_layer > 0:
" Now we have the gradient on the outputs of the hidden layer. Next, we
have to backpropagate the ReLU non-linearity. This turns out to be easy
because ReLU during the backward pass is effectively a switch. Since
r=max(0,x) , we have that dr/dx = 1 (x>0) . Combined with the chain
rule, we see that the ReLU unit lets the gradient pass through unchanged
if its input was greater than 0, but kills it if its input was less than
zero [or equal to zero - Mike's edit] during the forward pass."
Isn't it curious that the J-way of doing it,
if. # ilow=. (<"1@:($ #: I.@:(0 >: ,))) hidden_layer do. NB. find
indices of elements <: 0
dhidden =. 0 ilow } dhidden
end.
is much slower than the naive
dhidden =. (hidden_layer >0) * dscores dotT W2
?
Mike
On 15/05/2019 23:37, Brian Schott wrote:
BINGO is right.
That did it.
I'll have to look at why.
I tried using normalrand from stats but that did not change the results,
like BINGO.
Thank you, so much,
On Wed, May 15, 2019 at 6:30 PM 'Mike Day' via Programming <
[email protected]> wrote:
BINGO!
I'd misread that hidden_layer line. Try this rather inefficient
get-around which does achive what they intend:
dhidden =. (hidden_layer >0) * dscores dot |:W2
and then, Hey Presto!
cc =: train 10000
0 1.09876
1000 0.556147
2000 0.2535
3000 0.240415
4000 0.238063
5000 0.237042
6000 0.236117
7000 0.235589
8000 0.23537
9000 0.235135
10000 0.235025
$h_l =. 0>.(>1{cc) +"1 X dot >0{cc
300 100
$sc =. (>3{cc) +"1 h_l dot >2{cc
300 3
$predicted_class =. (i.>./)"1 sc
300
mean predicted_class = classes
0.996667
I'm using rand1 =: 0.01 * rnorm, but that might not matter much...
And so to bed,
Mike
--
(B=)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm