Mike, I agree that 'symbol' works better than 'dot' in plot.
I can't think why you got the nonce error. But I was trying to reproduce the following sequence from python. This sequence surprised me a little as I had originally used something more like yours. dhidden = np.dot(dscores, W2.T) # backprop the ReLU non-linearity dhidden[hidden_layer <= 0] = 0 I have read your follow-on messages and they reminded me that I renamed their y as classes because j verbs don't like the name y for non-arguments. Any yes classes are 0s 1s and 2s. They are using normal variates with ยต=0 and variance = 1, so I attempted to supply "uniform" variates with similar mean and variance, but of course not exactly the same relative frequencies. Uniform variates have variance =(b-a)^2/12, so (1-_1)^2/12 = 4/12 = 1/3 so I multiplied each uniform variate by %:3 to adjust. I hope that makes sense and hope it does not produce such a great difference, but I suppose I can get normal variates and experiment. Good idea. Wrt your question "Does indexing by "classes" (all 3s) have the same effect as their y ?", I hope so. Thanks very much, On Wed, May 15, 2019 at 4:51 PM 'Mike Day' via Programming < [email protected]> wrote: > I've had a look at your example and the source you cite. You differ > from the source in seeming to > > need explicit handling of hidden layer with both W & b AND W2 & b2 which > I can't understand right now. > > Ah - I've just found a second listing, lower down the page, which does > have W2 and b2 and a hidden layer! > > I found, at least in Windows 10, that 'dot'plot.... shows more or less > white space; 'symbol' plot is better. > > Anyway, when I first ran train, I got: > > train 100 > |nonce error: train > | dhidden=.0 indx}dhidden > > The trouble arose from this triplet of lines: > > dhidden =. dscores dot |:W2 > indx =. I. hidden_layer <: 0 > dhidden =. 0 indx}dhidden > > Since you seem to be restricting dhidden to be non-negative, I replaced > these three with: > > dhidden =. 0 >. dscores dot |:W2 NB. is this what you meant? > > I've also changed the loop so that we get a report for the first cycle, > as in Python: > > for_i. i. >: y do. > > and added this line after smoutput i,loss - might not be necessary in > Darwin... > > wd'msgs' > > With these changes, train ran as follows: > > cc =: train 10000 NB. loss starts ok, increases slightly, still > unlike the Python ex! > > 0 1.09856 > > 1000 1.10522 > > 2000 1.10218 > > 3000 1.0997 > > 4000 1.09887 > > 5000 1.09867 > > 6000 1.09862 > > 7000 1.09861 > > 8000 1.09861 > > 9000 1.09861 > > $h_l =. 0>.(>1{cc) +"1 X dot >0{cc > 300 100 > $sc =. (>3{cc) +"1 h_l dot >2{cc > 300 3 > $predicted_class =. (i.>./)"1 sc > 300 > mean predicted_class = classes > 0.333333 > > Why are the cycle 0 losses different, if only slightly? They report > 1.098744 cf your 1.09856 . > > Sorry - only minor problems found - they don't explain why you don't > reproduce their results > > more closely, > > Mike > > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
